Address correspondence to: Yi-Chieh Wu (yjw at mit.edu) and Manolis Kellis (manoli at mit.edu)
STAR-MP is a phylogenetic method for reconstructing architecture evolution based on a known species tree, extant architectures, and (reconstructed) module (domain) phylogenies.
In our paper, we considered domain architecture rearrangements in
9 fully sequenced
STAR-MP requires a species tree and species map. We use the species tree estimated by Tamura2004. Additionally, we provide the species map that specifies which genes belong to which species, and the species name abbreviations used in *.stree and *.smap. See SPIDIR and SPIMAP for more detail on these files.
Our files use the FlyBase peptide (e.g. dmel_FBpp0079164) as unique gene ids. Users who with to use alternative identifiers can use this tab-delimited file to map the peptide id to a (1) CG protein id (CG7562-PA), (2) common protein name (e.g. Trf-PA), (3) FlyBase gene id (dmel_FBgn0010287), (4) CG gene id (CG7562), (5) short gene name (Trf), or (6) long gene name (TBP-related factor).
Each line provides the gene, the start and end position (1-indexed) of the module, and the module family.
Each line in the text files lists the genes belonging to a single architecture family.
To focus on gene fusions and fissions, the architecture families were
filtered to a set of "merge/split" families, in which one species has
a gene with two connected modules and another species has a gene with
at least one of these modules unconnected. STAR-MP was used to
reconstruct the evolutionary histories of these families. These
families are indexed by their line number in "fams.ms.txt", and
for each family, we have provided the architecture family
(*.fam), the (100 bootstrapped) gene trees as reconstructed by
SPIMAP (*.nt.uniq.trees), the architecture scenario as
reconstructed by STAR-MP (*.mp), and a figure of this
reconstructed architecture scenario (*.mp.svg).
Finally, to limit the effect of genome annotation errors, we also
considered a conservative set of "merge/split" families, in which no
genes within the family are adjacent, no genes are at the ends of
scaffolds, and no genes have transitive BLAST hits through alternatively
spliced forms.
In addition, we considered three possible mechanisms for module
rearrangement and catalogued ~9000
Two adjacent genes merge into a single gene, or a single gene splits into two genes.
Large-loop mismatch repair or replication slippage results in a merged gene
located between the ancestral split (but not necessarily
adjacent) genes.
A retrotransposed copy of a gene combines with exons from another gene.
A chromosomal segment duplicates, and alternative portions of the duplicates are lost.
Last updated 03/08/13.