Paper
Coestimation of Gene Trees and Reconciliations
under a Duplication-Loss-Coalescence Model
Bo Zhang and
Yi-Chieh Wu,
Under review
Address correspondence to: Yi-Chieh Wu (yjw AT cs DOT hmc DOT edu)
Download
DLC-Coestimation is a gene tree reconstruction and reconciliation method.
It can be used to infer gene tree topologies and
to infer gene duplications, losses, and coalescence
(while accounting for incomplete lineage sorting).
- A development version is available on github.
Supplemental data
We evaluated DLC-Coestimation using the same datasets used to evaluate
DLCpar and
DLCoal.
This included 5351 real gene families across the 16 fungal genomes, as
well as simulated gene families across the 12 Drosophila clade.
Additional simulated datasets available upon request.
Note that DLC-Coestimation uses many of the same conventions as
other phylogenetic programs developed by our group
(e.g. SPIMAP,
TreeFix,
DLCoal,
DLCpar).
If a file format or directory structure is unclear,
you might be able to find more information at one of these websites.
-
Species trees: fungi.stree
Species maps: fungi.smap
Species abbreviations: fungi.names.txt
DLC-Coestimation requires a species tree and species map. We use the species trees
estimated by Butler2009 (fungi) and Tamura2004 (flies).
Additionally, we provide the species map that specifies which genes belong to which species,
and the species name abbreviations used in *.stree and *.smap.
-
Real fungi reconstructions and reconciliations:
DLC-Coestimation (46M)
Relation files:
real-fungi-rel.tar.gz (2.5M)
Each gene family is stored in its own directory real-fungi/FAMID,
where FAMID is a gene family ID. Each directory has the following files:
- FAMID.nt.align: a nucleotide alignment of the gene family in
FASTA format.
- FAMID.tree: a reconstructed gene tree in
Newick format.
- a reconciliation in DLCoal (three-tree) format:
- FAMID.dlccoestimation.coal.tree:
a copy of the gene tree in Newick format with named internal nodes
- FAMID.dlccoestimation.coal.recon:
a reconciliation mapping between the gene tree (*.coal.tree)
and the locus tree (*.locus.tree)
- FAMID.dlccoestimation.locus.tree:
a locus tree in Newick format
- FAMID.dlccoestimation.locus.recon:
a reconciliation mapping between the locus tree (*.locus.tree)
and the species tree (fungi.stree)
- FAMID.dlccoestimation.daughters:
a set of daughter nodes
References
- (Butler2009) Butler, G.; Rasmussen, M. D.; Lin, M. F.; Santos, M. A. S.; Sakthikumar, S.; Munro, C. A.; Rheinbay, E.; Grabherr, M.; Forche, A.; Reedy, J. L.; Agrafioti, I.; Arnaud, M. B.; Bates, S.; Brown, A. J. P.; Brunke, S.; Costanzo, M. C.; Fitzpatrick, D. A.; de Groot, P. W. J.; Harris, D.; Hoyer, L. L.; Hube, B.; Klis, F. M.; Kodira, C.; Lennard, N.; Logue, M. E.; Martin, R.; Neiman, A. M.; Nikolaou, E.; Quail, M. A.; Quinn, J.; Santos, M. C.; Schmitzberger, F. F.; Sherlock, G.; Shah, P.; Silverstein, K. A. T.; Skrzypek, M. S.; Soll, D.; Staggs, R.; Stansfield, I.; Stumpf, M. P. H.; Sudbery, P. E.; Srikantha, T.; Zeng, Q.; Berman, J.; Berriman, M.; Heitman, J.; Gow, N. A. R.; Lorenz, M. C.; Birren, B. W.; Kellis, M. & Cuomo, C. A. Evolution of pathogenicity and sexual reproduction in eight Candida genomes. Nature, 2009, 459, 657-662.
Last updated 02/28/17.