Jane

Jane 3 File Formats

This page describes the file formats supported by Jane. Jane supports both the .nex (NEXUS) format and a more human-readable .tree format. The .nex format is supported to provide compatibility with other cophylogeny programs while .tree is preferable when editing a tree by hand.

Jane allows the user to specify the relative times of events in the host and parasite trees, if that information is known. In Jane, these relative times are called "time zones". For example, the user may specify that some set of host nodes occurred in time zone 1, others occurred later in time zone 2, and so forth. Similarly, time zones can be specified for the parasite tree nodes. Jane will then find solutions that are consistent with these time zones. If complete information is known on the relative times of events, each node can have its own unique time zone. If only partial information is known about the relative times of events, several nodes can be placed in the same time zone. There are a few rules regarding time zones:

  1. Although specifying time zone information is optional, if it is provided for any node then it must be provided for every node.
  2. While a host node can be given at most one time zone, parasite nodes can be given either a single time zone or a range of time zones.
  3. In the host tree, time zones must begin with 1 and must be consecutive, meaning that if there is a time zone 1 and a time zone 3, then there must also be a time zone 2.
Finally, the rule for time zone placement is as follows: A parasite at time zone k (or a range that contains k) can be placed on a host node at time zone k or on a host edge whose endpoint is at time zone k.

Jane also supports "preferential host switching" in two different ways. One way is to use the .tree file format and annotate the input file with HOSTREGIONS and REGIONCOSTS information. This method allows the user to group the host nodes into any number of "regions" (a region may be a geographical region, but "region" here is used metaphorically as simply a group) and then specify different host switch costs between each pair of regions. For example, the user may wish to have a relatively high cost for switching between distantly related species and a lower one for more closely related species. Any number of different host switch costs can be specified this way.

A second simpler mechanism is to prohibit host switches from a take-off site to a landing site that exceed some specified distance. The distance from one edge of the host tree to another edge of the host tree is defined to be the number of nodes on the unique path between those edges. Thus, for example, two edges with distance 1 are sibling edges. Limiting host switch distance inhibits "long" host switches between very distantly related species. This is done within the Jane GUI or command-line interface. See the Jane tutorial for more information on this option.

There are sample files in the different formats included below. When you click on a file, it will render in your browser without whitespace separation. To see the file more clearly, you can either view the page source in your browser or download the contents. The examples provided here are small synthetic ones intended only to illustrate the format. If you are interested in trying Jane with some real biological problem instances, you can find them here.

Format 1: Jane .nex files

A nexus file must begin with the comment #NEXUS, followed by a series of blocks. A block is of the form:

begin blockname;
internal data
endblock;

We expect three blocks:

  1. Host Block
  2. Parasite Block
  3. Distribution block

The Host and Parasite should have a single line, tree host = tree; tree parasite = tree; respectively, where tree is defined by the following grammar: T → (T,T)
T → Species Name

The distribution block should contain a line beginning with Range, followed by a list of pairs of parasite:host (note that the colon is necessary). Each pair of parasite and host must be separated by a comma.

Note that line breaks are required after each semicolon for correct parsing. Please see the synthetic examples below.

To indicate time zone information for a node in the tree, add [zone] to indicate a single zone, or [zone_start, zone_end] to indicate starting and ending time zones, after the corresponding T. Time zone intervals are only permitted for nodes in the parasite tree. If time zone information is included anywhere, it has to be included everywhere.

Here are a few synthetic example nexus files:

Format 2: CoRe-PA .nex files

CoRe-PA software allows you to draw the species trees and save as .nex format. Other information such as costs and options for CoRe-PA can be stored in the same file as well. Even though Jane 3 does not support the same options and costs as CoRe-PA, the tree editor feature of CoRe-PA can be useful for drawing trees for Jane.

However, since the nexus format for CoRe-PA is not formally defined, we can only offer experimental support for this format in Jane 3. Jane will try to read any .nex file created by CoRe-PA tree editor and the tip associations, but it may not be able to. Furthermore, it will ignore all other information in the file, such as options and costs. Appropriate warning messages will be given for any problems encountered in reading CoRe-PA format files.

Format 3: Jane .tree files

A tree file must consist of a series of blocks: HOSTTREE, HOSTNAMES, PARASITETREE, PARASITENAMES, PHI, HOSTNAMES, and optionally HOSTRANKS, PARASITERANKS, HOSTREGIONS and REGIONCOSTS, in that order.

HOSTTREE and PARASITETREE should consist of a series of entries, one line for each node of each tree, of the form:
node child1 child2
for internal nodes, or
node null null
for tips. Every node here needs to be represented by a number.

HOSTNAMES and PARASITENAMES should be a series of lines listing the parasite/host's number, a tab, then a human-readable name for the host/parasite.

PHI should be a series of lines listing a host number, a tab, and then a list of parasite tips that infect that particular host. A host may appear at the start of multiple lines. Only the tips of the host and parasite tree should be used in this section (no internal node numbers should appear).

HOSTRANKS and PARASITERANKS should be lines with node number followed by a single number to indicate a single time zone, or two integers zone_start, zone_end to indicate an interval with a starting and ending time zone. Time zone intervals are only permitted for nodes in the parasite tree. If any time zone information is given, it must be given for every host and parasite.

HOSTREGIONS should be like HOSTRANKS of PARASITERANKS, but with region numbers instead of time zone numbers. Furthermore, only one region is allowed for any given node. When a host switch happens on a regioned tree, the cost is calculated by taking the original host switch cost and adding it to the region cost specified (see below). Note that by adding region information, you cause Jane to use an algorithm that can be several times slower.

REGIONCOSTS should be a list of triples indicating the region from which a switch occurs, the region to which the switch occurs, and the additional cost of such a switch (in that order). This list may be incomplete, and missing entries will be assumed to be zero. The triple (host_node_1, host_node_2, cost) defines how much to add to a host switch from edge 1 to edge 2, where edge 1 is the edge that terminates at host_node_1 and edge 2 is the edge that terminates at host_node_2.

Here are some synthetic example tree files:
Back to Jane Homepage