Paper
Most Parsimonious Reconciliation in the Presence of Gene Duplication, Loss, and Deep Coalescence.
Yi-Chieh Wu,
Matthew D. Rasmussen†,
Mukul S. Bansal†,
and
Manolis Kellis.
Genome Research. 2014. doi: 10.1101/gr.161968.113
Multiple Optimal Reconciliations under the Duplication-Loss-Coalescence Model.
Haoxing Du†,
Yi Sheng Ong†,
Marina Knittel,
Ross Mawhorter,
Nuo Liu,
Gianluca Gross,
Reiko Tojo,
Ran Libeskind-Hadas,
and
Yi-Chieh Wu.
IEEE/ACM Transactions on Computational Biology and Bioinformatics. In press. doi: 10.1109/TCBB.2019.2922337
Inferring Pareto-Optimal Reconciliations across Multiple Event Costs under the Duplication-Loss-Coalescence Model.
Ross Mawhorter,
Nuo Liu,
Ran Libeskind-Hadas,
and
Yi-Chieh Wu.
BMC Bioinformatics. 2019. doi: 10.1186/s12859-019-3206-6
An Integer Linear Programming Solution for the Most Parsimonious Reconciliation Problem under the Duplication-Loss-Coalescence Model.
Morgan Carothers,
Joseph Gardi,
Gianluca Gross,
Tatsuki Kuze,
Nuo Liu,
Fiona Plunkett,
Julia Qian,
Yi-Chieh Wu.
Submitted.
Address correspondence to: Yi-Chieh Wu (yjw AT cs DOT hmc DOT edu)
† Equal contribution
Download
DLCpar is a reconciliation method for inferring gene duplications,
losses, and coalescence (accounting for incomplete lineage sorting).
Requirements
Supplemental data
We evaluated DLCpar using the same datasets used to evaluate
DLCoal.
This included 5351 real gene families across the 16 fungal genomes, as
well as simulated gene families across the 12 Drosophila
and 15 primates (+ 2 outgroup species) clades.
In addition, we evaluated DLCpar on simulated gene families with
simulated species trees. (The species trees are the same as those
used to evaluate TreeFix.)
Additional simulated datasets available upon request.
Note that DLCpar uses many of the same conventions as
other phylogenetic programs developed by our group
(e.g. SPIMAP,
TreeFix,
DLCoal).
If a file format or directory structure is unclear,
you might be able to find more information at one of these websites.
-
Species trees: fungi.stree,
flies.stree,
primates.stree
Species maps: fungi.smap,
flies.smap,
primates.smap
Species abbreviations: fungi.names.txt,
flies.names.txt,
primates.names.txt
DLCpar requires a species tree and species map. We use the species trees
estimated by Butler2009 (fungi), Tamura2004 (flies), and Siepel2009 (primates).
Additionally, we provide the species map that specifies which genes belong to which species,
and the species name abbreviations used in *.stree and *.smap.
-
Real fungi reconciliations:
TreeFix+DLCpar (48M),
PhyML+DLCpar (46M)
Relation files:
real-fungi-rel.tar.gz (3.9M)
Each gene family is stored in its own directory real-fungi/FAMID,
where FAMID is a gene family ID. Each directory has the following files:
- FAMID.nt.align: a nucleotide alignment of the gene family in
FASTA format.
- FAMID.tree: a reconstructed gene tree in
Newick format.
- a DLCpar reconciliation in LCT format:
- FAMID.dlcpar.tree:
a gene tree in Newick format with implied, named internal nodes
- FAMID.dlcpar.recon:
a reconciliation mapping between the gene tree (*.tree)
and the species tree (fungi.stree)
in which each gene tree node is also labeled with a locus
- FAMID.dlcpar.order:
a partial order of internal gene tree nodes
- a DLCpar reconciliation in DLCoal (three-tree) format:
- FAMID.dlcpar.coal.tree:
a copy of the gene tree in Newick format with named internal nodes
- FAMID.dlcpar.coal.recon:
a reconciliation mapping between the gene tree (*.coal.tree)
and the locus tree (*.locus.tree)
- FAMID.dlcpar.locus.tree:
a locus tree in Newick format
- FAMID.dlcpar.locus.recon:
a reconciliation mapping between the locus tree (*.locus.tree)
and the species tree (fungi.stree)
- FAMID.dlcpar.daughters:
a set of daughter nodes
DLCpar was run with parameters D=1, L=1, C=1.
-
Simulated fly dataset: sim-flies.tar.gz (15M)
Simulated primate dataset: sim-primates.tar.gz (14M)
Each gene family is stored in its own directory sim-DATASET/POP-RATE/FAMID.
- DATASET: the dataset (flies, primates)
- POP: the effective population size (for flies, 1e6-500e6; for primates, 10e3-100e3)
- RATE: the (duplication and loss) rate multiplier (1x, 2x, 4x), where 1x is the rate observed in real data
- FAMID: the gene family ID (0-499)
Each directory has the following files:
- the true reconciliation in DLCoal format:
FAMID.coal.tree,
FAMID.coal.recon,
FAMID.locus.tree,
FAMID.locus.recon,
FAMID.daughters
- a DLCpar reconciliation in LCT format (see above)
- a DLCpar reconciliation in DLCoal format (see above)
DLCpar was run with parameters D=1, L=1, C=0.5.
-
Simulated species trees and species map:
sim-stree.tar.gz (39K)
Dataset for simulated species trees:
sim-stree.tar.gz (24M)
The configuration files are stored in the following structure:
sim/TREESIZE-SPECRATE/STREE.stree.
Each gene family is stored in its own directory
sim-stree/TREESIZE-SPECRATE/STREE/FAMID.
- TREESIZE:
number of extant species (5,10,20,50,100)
- SPECRATE:
speciation rate in events/myr (0.05,0.1,0.2,0.5,1)
- STREE:
species tree number (0-9)
- FAMID:
the gene family ID (0-99)
Each directory has the same files as for the simulated fly and primate datasets.
DLCpar was run with parameters D=1, L=1, C=1.
References
- (Butler2009) Butler, G.; Rasmussen, M. D.; Lin, M. F.; Santos, M. A. S.; Sakthikumar, S.; Munro, C. A.; Rheinbay, E.; Grabherr, M.; Forche, A.; Reedy, J. L.; Agrafioti, I.; Arnaud, M. B.; Bates, S.; Brown, A. J. P.; Brunke, S.; Costanzo, M. C.; Fitzpatrick, D. A.; de Groot, P. W. J.; Harris, D.; Hoyer, L. L.; Hube, B.; Klis, F. M.; Kodira, C.; Lennard, N.; Logue, M. E.; Martin, R.; Neiman, A. M.; Nikolaou, E.; Quail, M. A.; Quinn, J.; Santos, M. C.; Schmitzberger, F. F.; Sherlock, G.; Shah, P.; Silverstein, K. A. T.; Skrzypek, M. S.; Soll, D.; Staggs, R.; Stansfield, I.; Stumpf, M. P. H.; Sudbery, P. E.; Srikantha, T.; Zeng, Q.; Berman, J.; Berriman, M.; Heitman, J.; Gow, N. A. R.; Lorenz, M. C.; Birren, B. W.; Kellis, M. & Cuomo, C. A. Evolution of pathogenicity and sexual reproduction in eight Candida genomes. Nature, 2009, 459, 657-662.
- (Siepel2009) Siepel, A. Phylogenomics of primates and their ancestral populations. Genome Res, 2009, 19, 1929-1941.
- (Tamura2004) Tamura K, Subramanian S, Kumar S (2004) Temporal patterns of fruit fly (Drosophila) evolution revealed by mutation clocks. Mol Biol Evol 21: 36-44.
Last updated 06/18/20.