MrBayes Blocks

From Protists

Contents

Protein Alignment - Getting Started on ProtTest and Converting/Editing Files for MrBayes

  1. Determine correct model of sequence evolution (MoSE) of alignment via ProtTest. This program is on the Intel Mac and requires you to convert your .txt alignment file to a phylip 3.6 format file via MacClade. Open your .txt file as FASTA(protein)�File�export file as�Phylip 3.6. Open ProtTest (in bottom shortcut bar) and choose your new filename.phylip as the input. Leave all the other settings at their defaults and hit �start�. There are 96 possible models including the parameters: gamma (G), invariant sites for some residues, gamma for the rest (invgamma, I), and the rates of change should be estimated empirically from the alignment and not specified from another modeled dataset or statistical distribution pattern (F). Look over the results to determine the best MoSE for your data (and be sure to save it in text file as well as the output best NJ tree). The manual helps explain how to do this.
  2. Repeat opening your original .txt file in MacClade and go under Fileâ��Nexus output options and be sure that the only things with checks next to them are that the names are in the file and that they can be abbreviated and then hit ok. Check all names of OTUs to be sure they are ok (no spaces, dashes, <8 characters, etc.) and edit them by double clicking on the name. Also, change all gaps (-) to missing (?) characters by going under Utilitiesâ��Search and Replaceâ��find â�� and replace with ?, then choose replace all. Now go under Fileâ��Save and put the name of your original file such that it is filename.nxs and then hit save.
  3. Open this new file in TextEdit (also on the shortcut bar at the bottom of the screen). Be sure to delete the statement just after �Datatype=Protein� at the beginning of the file that says �Symbols=(whatever)� and also put brackets around the last part of the file after the data matrix �END;� that is just extra crap added to the end of the file from MacClade. Put one bracket right before the extra crap and one at the very end. Now you are ready to specify your MrBayes Block and be sure your file looks like the generalized MrBayes Amino Acid nexus file (see below). Finally, save the changes and put this into MrBayes to start. On the Intel Mac, double click on Parallels program icon and once you see the virtual Windows desktop, drag file into the MrBayes file folder on the desktop. Once the program is open type in execute filename.nxs. If it is going to take too long (estimated time in hours all the way to the right on the window), you might need to terminate the run and edit the file info (number of runs, generations, etc.) to make it shorter, then resave and re-run in MrBayes.


Generalized MrBayes Amino Acid Nexus File

Begin data; Dimensions NTAX=# NCHAR=#; Format Datatype=Protein Missing=? Gap=-;

Matrix [numbers] [���s] MONOV STTTTT� etc. for all included taxa (first OTU in alignment is assumed by MrBayes to be the outgroup)

END;

Begin mrbayes;

Prset aamodelpr=fixed(model name) statefreqpr=fixed(empirical); Lset rates=enterratetypehere;

Mcmc nruns=2 nchains=4 Temp=0.200000 swapfreq=1 nswaps=1 ngen=2000000 samplefreq=100 savebrlens=yes printfreq=50

Mcmcdiagn=yes diagnfreq=1000 relburnin=yes burninfrac=0.25 filename=451ap

sumt contype=halfcompat; End;

[block out end notations from MacClade]

Options for settings to specify correct Amino Acid MoSE Any Questions on MrBayes settings? Search the website�(http://mrbayes.csit.fsu.edu/Help/help.html)

Prset Explanations and Options: Aamodelpr -- This parameter sets the rate matrix for amino acid data. You can either fix the model by specifying aamodelpr=fixed(<model name>), where <model name> is 'poisson' (a glorified Jukes-Cantor model), 'jones', 'dayhoff', 'mtrev', 'mtmam', 'wag', 'rtrev', 'cprev', 'vt', 'blosum', 'equalin' (a glorified Felsenstein 1981 model), or 'gtr'. You can also average over the first ten models by specifying aamodelpr=mixed. If you do so, the Markov chain will sample each model according to its probability. The sampled model is reported as an index: poisson(0), jones(1), dayhoff(2), mtrev(3), mtmam(4), wag(5), rtrev(6), cprev(7), vt(8), or blosum(9). The 'Sump' command summarizes the MCMC samples and calculates the posterior probability estimate for each of these models. Statefreqpr -- This parameter specifies the prior on the state frequencies. The options are: prset statefreqpr = dirichlet(<number>) prset statefreqpr = dirichlet(<number>,...,<number>) prset statefreqpr = fixed(equal) prset statefreqpr = fixed(empirical) �(THIS LINE IS HOW YOU SPECIFY THE �F� PARAMETER) prset statefreqpr = fixed(<number>,...,<number>) For the dirichlet, you can specify either a single number or as many numbers as there are states. If you specify a single number, then the prior has all states equally probable with a variance related to the single parameter passed in. Parameter Options Current Setting


Tratiopr Beta/Fixed Beta(1.0,1.0) Revmatpr Dirichlet/Fixed Dirichlet(1.0,1.0,1.0,1.0,1.0,1.0) Aamodelpr Fixed/Mixed Fixed(Poisson) Aarevmatpr Dirichlet/Fixed Dirichlet(1.0,1.0,...) Omegapr Dirichlet/Fixed Dirichlet(1.0,1.0) Ny98omega1pr Beta/Fixed Beta(1.0,1.0) Ny98omega3pr Uniform/Exponential/Fixed Exponential(1.0) M3omegapr Exponential/Fixed Exponential Codoncatfreqs Dirichlet/Fixed Dirichlet(1.0,1.0,1.0) Statefreqpr Dirichlet/Fixed Dirichlet Ratepr Fixed/Variable=Dirichlet Fixed Shapepr Uniform/Exponential/Fixed Uniform(0.0,50.0) Ratecorrpr Uniform/Fixed Uniform(-1.0,1.0) Pinvarpr Uniform/Fixed Uniform(0.0,1.0) Covswitchpr Uniform/Exponential/Fixed Uniform(0.0,100.0) Symdirihyperpr Uniform/Exponential/Fixed Fixed(Infinity) Topologypr Uniform/Constraints Uniform Brlenspr Unconstrained/Clock Unconstrained:Exp(10.0) Speciationpr Uniform/Exponential/Fixed Uniform(0.0,10.0) Extinctionpr Uniform/Exponential/Fixed Uniform(0.0,10.0) Sampleprob <number> 1.00 Thetapr Uniform/Exponential/Fixed Uniform(0.0,10.0) Growthpr Uniform/Exponential/

                Fixed/Normal                 Fixed(0.0)

Lset options: Parameter Options Current Setting


Nucmodel 4by4/Doublet/Codon 4by4 Nst 1/2/6 1 Code Universal/Vertmt/Mycoplasma/

            Yeast/Ciliates/Metmt                  Universal 

Rates Equal/Gamma/Propinv/Invgamma/Adgamma Equal Ngammacat <number> 4 Nbetacat <number> 5 Omegavar Equal/Ny98/M3 Equal Covarion No/Yes No Coding All/Variable/Noabsencesites/

            Nopresencesites All 

Parsmodel No/Yes No


Mcmc options: Parameter Options Current Setting


Seed <number> 1115997956 Swapseed <number> 1115997956 Ngen <number> 1000000 Nruns <number> 2 Nchains <number> 4 Temp <number> 0.200000 Reweight <number>,<number> 0.00 v 0.00 ^ Swapfreq <number> 1 Nswaps <number> 1 Samplefreq <number> 100 Printfreq <number> 100 Printall Yes/No Yes Printmax <number> 8 Mcmcdiagn Yes/No Yes Diagnfreq <number> 1000 Minpartfreq <number> 0.10 Allchains Yes/No No Allcomps Yes/No No Relburnin Yes/No Yes Burnin <number> 0 Burninfrac <number> 0.25 Stoprule Yes/No No Stopval <number> 0.01

Filename <name> temp.out.

Startingtree Random/User Random Nperts <number> 0 Savebrlens Yes/No Yes Ordertaxa Yes/No No




DNA/RNA Alignment - Getting Started on Modeltest and Converting/Editing Files for MrBayes

1. Determine correct model of sequence evolution (MoSE) of alignment via Modeltest. This program is on the old G3 Mac and requires you to convert your .txt alignment file to a nexus (.nxs or .nex) file via MacClade. Open your .txt file as FASTA(DNA/RNA)â��Fileâ�� Nexus output options and be sure that the only things with checks next to them are that the names are in the file and that they can be abbreviated and then hit ok. Check all names of OTUs to be sure they are ok (no spaces, dashes, <8 characters, etc.) and edit them by double clicking on the name. Also, change all gaps (-) to missing (?) characters by going under Utilitiesâ��Search and Replaceâ��find â�� and replace with ?, then choose replace all. Now go under Fileâ��Save and put the name of your original file such that it is filename.nxs and then hit save. Drag this .nxs file into PAUP (on the bottom shortcut bar) and go under Fileâ�� execute. Now, drag the modelblock3 file from the desktop into the PAUP program and choose File â�� execute. The file should automatically start processing the different models using your alignment file as the basis. There are 56 possible models including the parameters: gamma (G) and invariant sites for some residues, gamma for the rest (invgamma, I). The PAUP window will let you see which model it is on out of 56 and will tell you at the last line when it is complete (~1-3hrs total). 2. Double click on the Modeltest icon (black and red maze-box) on the shortcut bar at the bottom of the screen. Leave the â��Argumentsâ�� line blank, but at the bottom change the option to â��input from fileâ�� and it will ask you to select a file. Use the new model.scores nexus file in the folder where your nexus file (you originally dragged into PAUP with modelblock3) is located. Also, on the Modeltest GUI, select â��output to fileâ�� and then hit the â��RUNâ�� button. A new file will appear in the same window as your nexus file. Double click on this to view the results of Modeltest on your dataset. Look over the results to determine the best MoSE for your data (and be sure to save it in text file as well as the output best NJ tree). AIC will always be more conservative than the HLRT MoSE chosen if they are different. Generally, the more complex MoSE fits the data better in MrBayes though. Also, â��Fâ�� (estimating the MoSE values based on the empirical data from your input dataset) is a given and should always be set in the prset options of the MrBayes block. 3. Open your nexus file in TextEdit (also on the shortcut bar at the bottom of the screen). Be sure to delete the statement at the beginning of the file that says â��Options=(whatever)â�� and also put brackets around the last part of the file after the data matrix â��END;â�� that is just extra crap added to the end of the file from MacClade. Put one bracket right before the extra crap and one at the very end. Now you are ready to specify your MrBayes Block and be sure your file looks like the generalized MrBayes Nucleotide nexus file (see below). Finally, save the changes and put this into MrBayes to start. On the Intel Mac, double click on Parallels program icon and once you see the virtual Windows desktop, drag file into the MrBayes file folder on the desktop. Once the program is open type in execute filename.nxs. If it is going to take too long (estimated time in hours all the way to the right on the window), you might need to terminate the run and edit the file info (number of runs, generations, etc.) to make it shorter, then resave and re-run in MrBayes.



Generalized MrBayes Nucleotide Nexus File

Begin data; Dimensions NTAX=# NCHAR=#; Format Datatype=DNA Missing=? Gap=-;

Matrix [numbers] [���s] MONOV CTGATTG� etc. for all included taxa (first OTU in alignment is assumed by MrBayes to be the outgroup)

END;

Begin mrbayes;

Prset statefreqpr=fixed(empirical); Lset nst=# rates=enterratetypehere;

Mcmc nruns=2 nchains=4 Temp=0.200000 swapfreq=1 nswaps=1 ngen=2000000 samplefreq=100 savebrlens=yes printfreq=50

Mcmcdiagn=yes diagnfreq=1000 relburnin=yes burninfrac=0.50 filename=4518s;

sumt contype=halfcompat;

End;

[block out end notations from MacClade]



Generalized MrBayes Protein-Coding Nucleotide Nexus File that Excludes 3rd Codon Position

(Note: you could also set this up with charsets and partitions, but it is longer) Begin data; Dimensions NTAX=# NCHAR=#; Format Datatype=mixed(dna:1-#,protein:#+1-lastresidue#) Missing=? Gap=-;

Matrix [numbers] [���s] MONOV CTGATTG� etc. for all included taxa (first OTU in alignment is assumed by MrBayes to be the outgroup)

END;

Begin mrbayes;

exclude 1-last#3;

Prset statefreqpr=fixed(empirical); lset nst=# rates=enterratetypehere;

Mcmc nruns=2 nchains=4 Temp=0.200000 swapfreq=1 nswaps=1 ngen=2000000 samplefreq=100 savebrlens=yes printfreq=50

Mcmcdiagn=yes diagnfreq=1000 relburnin=yes burninfrac=0.50 filename=1anNo3rd

sumt contype=halfcompat; End;

[block out end notations from MacClade]



Options for settings to specify correct Nucleotide MoSE

Pretty much the same settings as mentioned above in specifying the correct Amino Acid MoSE (see above).

Lset Explanations and Options: Nst -- Sets the number of substitution types: "1" constrains all of the rates to be the same (e.g., a JC69 or F81 model); "2" allows transitions and transversions to have potentially different rates (e.g., a K80 or HKY85 model); "6" allows all rates to be different, subject to the constraint of time-reversibility (e.g., a GTR model).



Concatenated Alignments - Setting Character Sets and Partitions of Your Alignments

Charset This command defines a character set. The format for the charset command is charset <name> = <character numbers> For example, "charset first_pos = 1-7203" defines a character set called "first_pos" that includes every third site from 1 to 720. The character set name cannot have any spaces in it. The slash () is a nifty way of telling the program to assign every third (or second, or fifth, or whatever) character to the character set. This option is best used not from the command line but rather as a line in the mrbayes block of a file. Note that you can use "." to stand in for the last character (e.g., charset 1-.3).

  charset first = 1-.3;
  charset second = 2-.3;
  charset third = 3-.3;


Partition This command allows you to specify a character partition. The format for this command is partition <name> = <num parts>:<chars in first>, ...,<chars in last> For example, "partition by_codon = 3:1st_pos,2nd_pos,3rd_pos" specifies a partition called "by_codon" which consists of three parts (first, second, and third codon positions). Here, we are assuming that the sites in each partition were defined using the charset command. You can specify a partition without using charset as follows: partition by_codon = 3:1 4 6 9 12,2 5 7 10 13,3 6 8 11 14 However, we recommend that you use the charsets to define a set of characters and then use these predefined sets when defining the partition. Also, it makes more sense to define a partition as a line in the mrbayes block than to issue the command from the command line (then again, you may be a masochist, and want to do extra work).

Exclude This command excludes characters from the analysis. The correct usage is exclude <number> <number> <number> or exclude <number> - <number> or exclude <charset> or some combination thereof. Moreover, you can use the specifier "" to exclude every nth character. For example, the following exclude 1-1003 would exclude every third character. As a specific example, exclude 2 3 10-14 22 excludes sites 2, 3, 10, 11, 12, 13, 14, and 22 from the analysis. Also, exclude all excludes all of the characters from the analysis. Excluding all characters does not leave you much information for inferring phylogeny.


Format This command is used in a data block to define the format of the character matrix. The correct usage is format datatype=<name> ... <parameter>=<option> The format command must be the second command in a data block. The following provides an example of the proper use of this command: begin data;

  dimensions ntax=4 nchar=10;
  format datatype=dna gap=-;
  matrix
  taxon_1 AACGATTCGT
  taxon_2 AAGGAT--CA
  taxon_3 AACGACTCCT
  taxon_4 AAGGATTCCT
  ;

end; Here, the format command tells MrBayes to expect a matrix with DNA characters and with gaps coded as "-". The following are valid options for format: Datatype -- This parameter MUST BE INCLUDED in the format command. More-over, it must be the first parameter in the line. The datatype command specifies what type of characters are in the matrix. The following are valid options: Datatype = Dna: DNA states (A,C,G,T,R,Y,M,K,S,W,H,B,V,D,N) Datatype = Rna: DNA states (A,C,G,U,R,Y,M,K,S,W,H,B,V,D,N) Datatype = Protein: Amino acid states (A,R,N,D,C,Q,E,G,H,I,L,K,M,F,P,S,T,W,Y,V) Datatype = Restriction: Restriction site (0,1) states Datatype = Standard: Morphological (0,1) states Datatype = Continuous: Real number valued states Datatype = Mixed(<type>:<range>,...,<type>:<range>): A mixture of the above datatypes. For example, "datatype=mixed(dna:1-100,protein:101-200)" would specify a mixture of DNA and amino acid characters with the DNA characters occupying the first 100 sites and the amino acid characters occupying the last 100 sites. Interleave -- This parameter specifies whether the data matrix is in interleave format. The valid options are "Yes" or "No", with "No" as the default. An interleaved matrix looks like format datatype=dna gap=- interleave=yes; matrix taxon_1 AACGATTCGT taxon_2 AAGGAT--CA taxon_3 AACGACTCCT taxon_4 AAGGATTCCT

taxon_1 CCTGGTAC taxon_2 CCTGGTAC taxon_3 ---GGTAG taxon_4 ---GGTAG

Gap -- This parameter specifies the format for gaps. Note that gap character can only be a single character and that it cannot correspond to a standard state (e.g., A,C,G,T,R,Y,M,K,S,W,H,B,V,D,N for nucleotide data). Missing -- This parameter specifies the format for missing data. Note that the missing character can only be a single character and cannot correspond to a standard state (e.g.,A,C,G,T,R,Y,M,K,S,W,H,B,V,D,N for nucleotide data). This is often an unnecessary parameter to set because many data types, such as nucleotide or amino acid, already have a missing character specified. However, for morphological or restriction site data, "missing=?" is often used to specify ambiguity or unobserved data. Matchchar -- This parameter specifies the matching character for the matrix. For example, format datatype=dna gap=- matchchar=.; matrix taxon_1 AACGATTCGT taxon_2 ..G...--CA taxon_3 .....C..C. taxon_4 ..G.....C.

is equivalent to format datatype=dna gap=-; matrix taxon_1 AACGATTCGT taxon_2 AAGGAT--CA taxon_3 AACGACTCCT taxon_4 AAGGATTCCT

The only non-standard NEXUS format option is the use of the "mixed", "restriction", "standard" and "continuous" datatypes. Hence, if you use any of these datatype specifiers, a program like PAUP* or MacClade will report an error (as they should because MrBayes is not strictly NEXUS compliant).

Unlink This command unlinks model parameters across partitions of the data. The correct usage is: unlink <parameter name> = (<all> or <partition list>) A little background is necessary to understand this command. Upon execution of a file, a default partition is set up. This partition referenced either by its name ("default") or number (0). If your data are all of one type, then this default partition does not actually divide up your characters. However, if your datatype is mixed, then the default partition contains as many divisions as there are datatypes in your character matrix. Of course, you can also define other partitions, and switch among them using the set command ("set partition=<name/number>"). Importantly, you can also assign model parameters to individual partitions or to groups of them using the "applyto" option in lset and prset. When the program attempts to perform an analysis, the model is set for individual partitions. If the same parameter applies to different partitions and if that parameter has the same prior, then the program will link the parameters: that is, it will use a single value for the parameter. The program's default, then, is to strive for parsimony. However, there are lots of cases where you may want unlink a parameter across partitions. For example, you may want a different transition/transversion rate ratio to apply to different partitions. This command allows you to unlink the parameters, or to make them different across partitions. The converse of this command is "link", which links together parameters that were previously told to be different. The list of parameters that can be unlinked includes: Tratio -- Transition/transversion rate ratio Revmat -- Substitution rates of GTR model Omega -- Nonsynonymous/synonymous rate ratio Statefreq -- Character state frequencies Shape -- Gamma shape parameter Pinvar -- Proportion of invariable sites Correlation -- Correlation parameter of autodiscrete gamma Switchrates -- Switching rates for covarion model Brlens -- Branch lengths of tree Topology -- Topology of tree Speciationrates -- Speciation rates for birth-death process Extinctionrates -- Extinction rates for birth-death process Theta -- Parameter for coalescence process Growthrate -- Growth rate of coalesence process For example, unlink shape=(all) unlinks the gamma shape parameter across all partitions of the data. You can use "showmodel" to see the current linking status of the characters.

Prset options: Applyto -- This option allows you to apply the prset commands to specific partitions. This command should be the first in the list of commands specified in prset. Moreover, it only makes sense to be using this command if the data have been partitioned. A default partition is set on execution of a matrix. If the data are homogeneous (i.e., all of the same data type), then this partition will not subdivide the characters. Up to 30 other partitions can be defined, and you can switch among them using "set partition=<partition name>". Now, you may want to specify different priors to different partitions of the data. Applyto allows you to do this. For example, say you have partitioned the data by codon position, and you want to fix the statefreqs to equal for the first two partitions but apply a flat Dirichlet prior to the state-freqs of the last. This could be implemented in two uses of prset: prset applyto=(1,2) statefreqs=fixed(equal) prset applyto=(3) statefreqs=dirichlet(1,1,1,1) The first applies the parameters after "applyto" to the first and second partitions. The second prset applies a flat Dirichlet to the third partition. You can also use applyto=(all), which attempts to apply the parameter settings to all of the data partitions. Importantly, if the option is not consistent with the data in the partition, the program will not apply the prset option to that partition.

Lset options: Applyto -- This option allows you to apply the lset commands to specific partitions. This command should be the first in the list of commands specified in lset. Moreover, it only makes sense to be using this command if the data have been partitioned. A default partition is set on execution of a matrix. If the data are homogeneous (i.e., all of the same data type), then this partition will not subdivide the characters. Up to 30 other partitions can be defined, and you can switch among them using "set partition=<partition name>". Now, you may want to specify different models to different partitions of the data. Applyto allows you to do this. For example, say you have partitioned the data by codon position, and you want to apply a nst=2 model to the first two partitions and nst=6 to the last. This could be implemented in two uses of lset: lset applyto=(1,2) nst=2 lset applyto=(3) nst=6 The first applies the parameters after "applyto" to the first and second partitions. The second lset applies nst=6 to the third partition. You can also use applyto=(all), which attempts to apply the parameter settings to all of the data partitions. Importantly, if the option is not consistent with the data in the partition, the program will not apply the lset option to that partition.



Generalized MrBayes Concatenated Unlinked Nucleotide Nexus File

(Note: a concatenated linked nuc alignment would be assumed to have the same MoSE, so no different than �Generalized MrBayes Nucleotide Nexus File�) Begin data; Dimensions NTAX=# NCHAR=#; Format Datatype=DNA Missing=? Gap=-;

Matrix [numbers] [���s] MONOV CTGATTG� etc. for all included taxa (first OTU in alignment is assumed by MrBayes to be the outgroup)

END;

Begin mrbayes;

charset 18s = 1-blah; charset ef1an = blah+1-.;

partition bygene = 2:18s,ef1an;

Prset statefreqpr=fixed(empirical); lset applyto=(1) nst=# rates=enterratetypehere; lset applyto=(2) nst=# rates=enterratetypehere;

Mcmc nruns=2 nchains=4 Temp=0.200000 swapfreq=1 nswaps=1 ngen=2000000 samplefreq=100 savebrlens=yes printfreq=50

Mcmcdiagn=yes diagnfreq=1000 relburnin=yes burninfrac=0.50 filename=2n18s1an

sumt contype=halfcompat; End;

[block out end notations from MacClade]



Generalized MrBayes Concatenated Mixed Datatype Nexus File

(Note: a concatenated linked nuc alignment would be assumed to have the same MoSE, so no different than �Generalized MrBayes Nucleotide Nexus File�) Begin data; Dimensions NTAX=# NCHAR=#; Format Datatype=DNA Missing=? Gap=-;

Matrix [numbers] [���s] MONOV CTGATTG� etc. for all included taxa (first OTU in alignment is assumed by MrBayes to be the outgroup)

END;

Begin mrbayes;

charset 18s = 1-blah; charset ef1an = blah+1-.;

partition bygene = 2:18s,ef1an;

Prset applyto=(1) statefreqpr=fixed(empirical); Prset applyto=(2) aamodelpr=fixed(model name) statefreqpr=fixed(empirical); lset applyto=(1) nst=# rates=enterratetypehere; lset applyto=(2) rates=enterratetypehere;

Mcmc nruns=2 nchains=4 Temp=0.200000 swapfreq=1 nswaps=1 ngen=2000000 samplefreq=100 savebrlens=yes printfreq=50

Mcmcdiagn=yes diagnfreq=1000 relburnin=yes burninfrac=0.50 filename=2np18s1ap

sumt contype=halfcompat; End;

[block out end notations from MacClade]

Personal tools