Instructions for tree reconstruction for Selaginella moellendorffii genes

T. Nishiyama, Kanazawa University (email: tomoakin@kenroku.kanazawa-u.ac.jp

Prerequisetes

You have the amino acid sequence of interest (may or may not be Selaginella gene).
If your gene of interest is not properly represented in Filtered model 2, the amino acid sequence of your gene model.

Overview

Get related sequences from http://moss.nibb.ac.jp/cgi-bin/blast-nr-Selmo
Consult the blast output and alignment
Select more or less sequences if necessary with http://moss.nibb.ac.jp/cgi-bin/selectNalign
Remove excess gene and mark regions to use for the phylogenetic analysis
Reconstruct Neighbor-Joining tree at http://moss.nibb.ac.jp/cgi-bin/makenjtreeSelmo

Get related sequences

http://moss.nibb.ac.jp/cgi-bin/blast-nr-Selmo

This process will run a blast search agains combined dataset of nr and Selmo Filtered Model 2. Chlamydomonas, Physcomitrella, Arabidopsis, and Rice sequences should be represented in the nr dataset.

The process collect up to 1000 of the hit sequences, which is returned as a fasta file with the supplied name plus ".nrSmoFM2" suffix.

Up to 100 sequences are selected based on the order on the blast results and further aligned and converted to nexus format that can be processed with MacClade

Consult the blast output and alignment

In some case we need more than 100 sequence to get the whole gene family.

If you see a sudden drop in the similarity it is the end of the gene family.

But in many case the similarity goes down smoothly that there is no distinct point: then try upto several hundred genes.

Select more or less sequences if necessary

http://moss.nibb.ac.jp/cgi-bin/selectNalign

Add sequences not in Filtered Model 2 to the fasta file that you obtained at "Get related sequences"

Give an appropriate number N for the number of genes you are willing to analyze.

Supply the sequence of interest as query.

Supply the fasta file as the sequence collection.

The process will give you back an alignment of sequences up to N.

Remove excess gene and mark regions to use for the phylogenetic analysis

It is essential to use only homologous sites for the tree reconstruction Remove genes with excess gaps unless the gene is very interesting ones.

Unmark regions that is not homologous for all of the sequence or having gap in some sequences.

If you find negative branches or exessively long branches, it is quite likely that you incorporated non-homologous sequences aligned.

Construct the Neighbor-Joining tree

http://moss.nibb.ac.jp/cgi-bin/makenjtreeSelmo

Neighbor-Joining tree can be constructed at http://moss.nibb.ac.jp/cgi-bin/makenjtreeSelmo

The process automatically performes neighbor joining tree reconstruction and bootstraping.

The tree are in Newick format (text) and in SVG format (vector graphics that can be processed with Adobe Illustrator (CS, CS2, or CS3))

Usually see the file endig with treeagi.svg