Paper
Reconciliation Feasibility in the Presence of Gene Duplication, Loss, and Coalescence with Multiple Individuals per Species
Jennifer Rogers,
Andrew Fishberg,
Nora Youngs,
and
Yi-Chieh Wu.
In prep.
Address correspondence to: Yi-Chieh Wu (yjw AT cs DOT hmc DOT edu)
Download
PLCT is a package for understanding gene tree evolution through
gene duplications, losses, and coalescence with multiple samples per species.
Requirements
Supplemental data
In our paper, we evaluated feasibility using
6798 real gene families across 7 ape genomes, as well as
simulated gene families across the 12 Drosophila clade.
-
Species trees: apes.stree,
flies.stree
Species maps: apes.smap,
flies.smap
Species abbreviations: flies.names.txt
PLCT does not require a species tree. However, we provide the species trees, the species maps
that specify which genes belong to which species, and the species name abbreviations for reference.
-
Real ape dataset
Alignments:
real-apes.tar.gz (53M)
Trees:
PHYLIP (2.4M),
BioNJ (2.5M),
PhyML (2.8M),
RAxML (3.2M)
Feasibility analysis:
real-apes-plct.tar.gz (127K)
Filenames have the format FAMID.HAPLOTYPE.EXT.
Alignments are in FASTA format
and trees in Newick format.
- FAMID: the gene family ID from Ensembl
- HAPLOTYPE: the haplotype number (1,2)
- EXT: align for alignments, tree for gene trees
-
Simulated fly dataset
Data:
sim-flies.tar.gz (532M)
Feasibility analysis:
sim-flies-plct.tar.gz (128K)
Each gene family is stored in its own directory sim-flies/POP-RATE/FAMID.
- POP:
the effective population size (1e6-100e6)
- RATE:
the (duplication and loss) rate multiplier (1x,2x,4x),
where 1x is the rate observed in real data
- FAMID: the gene family ID (0-499)
Each directory has the following files:
- the true reconciliation in DLCoal format:
- FAMID.coal.tree:
the gene tree in Newick format with named internal nodes
- FAMID.coal.recon:
the reconciliation mapping between the gene tree (*.coal.tree)
and the locus tree (*.locus.tree)
- FAMID.locus.tree:
the locus tree in Newick format
- FAMID.locus.recon:
the reconciliation mapping between the locus tree (*.locus.tree)
and the species tree (.stree)
- FAMID.daughters:
the set of daughter nodes
- FAMID.coal.align: the simulated alignment in FASTA format
- FAMID.coal.raxml.tree: the reconstructed gene tree in Newick format
- the reconciliations, alignments, and trees for multiple samples per species
(N=2,5,10):
FAMID.coalN.tree, FAMID.coalN.recon,
FAMID.coalN.align, FAMID.coalN.raxml.tree
Last updated 07/20/16.