gPartition: An Efficient Alignment Partitioning Program for Genome Datasets

Thu Kim Le; Diep Thi Hoang; Do Duc Dong; Bui Ngoc Thang; Nguyen Phuong Thao; Le Sy Vinh

doi:10.25073/2588-1086/vnucsce.353

Thu Kim Le, Diep Thi Hoang, Le Sy Vinh

PDF

Published Dec 16, 2022

DOI: https://doi.org/10.25073/2588-1086/vnucsce.353

How to Cite

KIM LE, Thu et al. gPartition: An Efficient Alignment Partitioning Program for Genome Datasets. VNU Journal of Science: Computer Science and Communication Engineering, [S.l.], v. 39, n. 1, dec. 2022. ISSN 2588-1086. Available at: <//jcsce.vnu.edu.vn/index.php/jcsce/article/view/353>. Date accessed: 01 aug. 2025. doi: https://doi.org/10.25073/2588-1086/vnucsce.353.

ABNT APA BibTeX CBE EndNote - EndNote format (Macintosh & Windows) MLA ProCite - RIS format (Macintosh & Windows) RefWorks Reference Manager - RIS format (Windows only) Turabian

Issue

Vol 39 No 1

Section

Original Articles

Abstract

Phylogenomics, or evolutionary inference based on genome alignment, is becoming prominent thanks to next-generation sequencing technologies. In model-based phylogenomics, the partition scheme has a significant impact on inference performance, both in terms of log-likelihoods and computation time. Therefore, finding an optimal partition scheme, or partitioning, is critical in a phylogenomic inference pipeline. To accomplish this, one needs to divide the alignment sites into disjoint partitions so that the sites of similar evolutionary models are in the same partition. Computational partitioning is a recent approach of increasing interest due to its capability of modeling the site-rate heterogeneity within a single gene. State-of-the-art computational partitioning methods, such as mPartition or RatePartition, are, however, ineffective on long alignments of millions of sites. In this paper, we introduce gPartition, a new computational partitioning method leveraging both the site rate and the best-fit substitution model. We conducted experiments on recently published alignments to compare gPartition with mPartition and RatePartition. gPartition was orders of magnitude faster than other methods. The AIC score demonstrated that gPartition produced partition schemes that were better or comparable to mPartition. gPartition outperformed RatePartition on all examined alignments. We implemented our proposed method in the gPartition program to help researchers partition genome alignments with millions of sites more efficiently.

Article Sidebar

Article Details

Main Article Content

Abstract