Institute for Environmental Genomics
Institute for Environmental Genomics
Institute for Environmental Genomics

Software 

BLAST: This is a standalone version of BLAST. It can be installed in your PCs and run in the MSDOS mode. It is also available for other platforms, including general Linux and Unix systems. The instructions and options are available. Online version can be accessed by the following link: http://www.ncbi.nih.gov/BLAST/. This service is designed to take protein and nucleic acid sequences and compare them against a selection of NCBI databases. The BLAST algorithm was written balancing speed and increased sensitivity for distant sequence relationships.  Instead of relying on global alignments, BLAST emphasizes regions of local alignment to detect relationships among sequences which share only isolated regions of similarity (Altschul et al., 1990). 

ClustalX: Clustal X is a new windows interface for the ClustalW multiple sequence alignment program. It provides an integrated environment for performing multiple sequence and profile alignments and analyzing alignment results. The newest version can be downloaded from the following site: ftp://ftp-igbmc.u-strasbg.fr/pub/ClustalX/. ClustalX is available for a number of different platforms including: SUN Solaris, IRIX5.3 on Silicon Graphics, Digital UNIX on DECStations, Microsoft Windows (32 bit) for PC's, Linux ELF for x86 PC's and Macintosh PowerMac. 

¡¡

CommOligo: CommOligo is a newly developed software in this Lab. It can be used to select optimal oligonucleotides for microarray construction. The program picks up single or multiple oligonucleotide probes for each sequence/gene. It uses global alignment algorithms and considers multiple criteria for probe design. The program is flexible for the user to choose values for different parameters, such as sequence identity, the length of maximum continuous stretches, and free energy. The length of oligonucleotide can be from 8 to 128mer. However, different lengths may need to establish different criteria based on experimental data. An accessory software tool, Commoligo_PE may help you to estimate those criteria if some experimental data are available. It is available from the following link: Download

Please cite or refer to the publication for details:

Li X*, He Z* and Zhou J. Selection of optimal oligonucleotide probes for microarrays using multiple criteria, global alignment and parameter estimation (*co-first author). Nucleic Acids Research, 33: 6114-6123.

 

CommOligo2.0: This software has been developed based on CommOligo in this Lab. However, it is able to select group-specific oligonucleotide probes for a group of highly homologous sequences. It specifically targets functional genes with high similarities so that it is very useful for the construction of functional gene microarrays. It is available from the following link: Download

Primer3: Primer3 is a software tool commonly used for designing PCR primers. It can design primers for multiple sequences using batch mode ¨C command line style. It is also able to select primers for a single sequence using a web-based interface. The program can be run on Windows, Linux/Unix, and other systems. Primer3 is free and offered on an "as-is", and use-at-your-own-risk basis. The sources can be found on the following website: http://frodo.wi.mit.edu/primer3/primer3_code.html. Web interface can be accessed by the following link:

http://fokker.wi.mit.edu/cgi-bin/primer3/primer3_www.cgi

Please cite or refer to the publication for details:

Rozen S, Skaletsky H (2000) Primer3 on the WWW for general users and for biologist programmers. In: Krawetz S, Misener S (eds) Bioinformatics Methods and Protocols: Methods in Molecular Biology. Humana Press, Totowa, NJ, pp 365-386  

 

MEGA:  Molecular Evolutionary Genetics Analysis is an integrated software package for multisequence alignment, phylogenetic tree construction and evolutionary analysis.  The newest version can be downloaded from the following site: http://www.megasoftware.net/.  MEGA is available for Microsoft Windows (32 bit) for PC.

 

PAML:  Phylogenetic Analysis by Maximum Likelihood is a software package for the evolutionary analysis of phylogenetic trees using maximum likelihood methods.  Analyses possible include tests of the molecular clock hypothesis, rate heterogeneity along lineages and among sites, ancestoral sequence prediction and simulations.  Nucleotide, amino acid and codon-based models are all supported by the software.  PAML can be downloaded from the following site: http://abacus.gene.ucl.ac.uk/software/paml.html.  The following platforms are supported:  Microsoft Windows (32 bit) for PC, Macintosh OS X, most UNIX/LINUX distros.  (Yang 1997)

 

BioPerl:  BioPerl is a community effort to construct a set of standardized Perl modules designed to simplify common bioinformatics analyses.  Tasks that can be carried out using BioPerl include:  report parsing (Blast, HMMer, etc.); manipulation of sequence files (translation, format interconversion, etc.), sequence alignments (identify mismatches, multisequence alignment, etc.) and phylogenetic trees; evolution and population genetics analyses (PAML, pairwise statistics, Ka/Ks calculation, etc.); feature annotation.  The BioPerl package can be downloaded from the following sites:  http://www.bioperl.org; http://search.cpan.org/dist/bioperl/.  For Microsoft Windows systems, BioPerl is also available through the ActiveSite Perl ppm utility (http://www.activestate.com).  BioPerl is available for the following systems:  Microsoft Windows (32 bit) for PC, Apple Macintosh OS X and all UNIX/LINUX distros.

 

Artemis/ACT:  Artemis and the Artemis Comparison Tool (ACT) are available from the Wellcome Trust Sanger Institute (http://www.sanger.ac.uk/Software/).  Artemis is a free genome viewer and analysis tool that can accept sequence files in a variety of formats, including annotated GenBank and EMBL formats.  ACT is a DNA sequence comparison tool based on Artemis that is useful in comparative genomics analyses.  Both Artemis and ACT are written in Java and may be run on the following platforms:  Microsoft Windows (32 bit) for PC, Apple Macintosh OS X, all UNIX/LINUX distros.

 

PyMol:  PyMol is an open source molecular visualization program useful of creating publication quality protein structure images (http://pymol.sourceforge.net/).  The software is available for the following platforms:  Microsoft Windows (32 bit) for PC, Apple Macintosh OS X, all UNIX/LINUX distros. 

General guideline for statistical software packages: 

If you torture the data long enough, Nature will confess. (Ronald Coase, 1991 Novel prize lacerate in Economics). 

Learning statistics and knowing how to use different methods cannot or should not be separated. Often there is more than one method can provide answer and not all of them are necessary legitimate one for the situation. Also almost all statistical analysis has various options for improving and optimizing their performance. Legitimate statistics should not be selected by its results but by the careful assessment of its assumptions. Without knowing what to do but just selecting default and/or without checking whether data meet the assumptions, one¡¯s statistical analysis should not be trusted. In much simpler way, the interpretation of the results of analysis cannot be done at least in a legitimate way without understanding the specific statistical method. Certain statistical procedure can be learned rather easier if one is equipped with sound foundation of statistics, such as normality and independence of data, statistical test, few basic distributions, central limit theorem and so on. Thus it is always strongly advised to learn fundamental statistics before causing problems by torturing statistics to provide answers one would like to see.

 

SAS (SAS Institute, Cary, NC. sas.com)

SAS has been developed its name as the statistical software package for general statistical analysis. The current version is 9.1.3 and Mac version development has stopped at version 6. SAS has many modules such as SAS/STAT for statistical analysis, SAS/INSIGHT for exploratory data analysis, SAS/IML for matrix programming and so on. SAS Institute also provides solutions for different areas and their recent addition is SAS® Microarray in that microarray data analysis is stream-lined in very intact way. Most academic and research institutes provide site license for their users. SAS is very powerful and trusty, but not very intuitive to learn at first since SAS/STAT is still running with complete program format which user must define the location of input file, data format, procedure (proc) and output with various available options. One of easier introductory book would be Applied Statistics and the SAS Programming Language by Ronald P. Cody & Jeffrey K. Smith (Prentice Hall). One of the recent major addition to SAS system is SAS® Enterprise MinerTM, which enables streamline of data mining (or machine learning) procedure within rather intuitive programming environment.

 

R (The R Project for Statistical Computing. r-project.org)

R is a language and environment for statistical computing and graphics. It is a GNU project which is similar to the S language and environment. There are some important differences, but much code written for S runs unaltered under R. R provides a wide variety of statistical (linear and nonlinear modelling, classical statistical tests, time-series analysis, classification, clustering, and graphical techniques, and is highly extensible. The S language is often the vehicle of choice for research in statistical. R is available as Free Software under the terms of the Free Software Foundation's GNU General Public License in source code form. It compiles and runs on a wide variety of UNIX platforms and similar systems (including FreeBSD and Linux), Windows and MacOS.

From the novice user¡¯s point-of-view, R is very versatile since a lot of advanced users/developers implement their new statistical methods in R and provides as a format of package. A lot of common statistical methods are included as base and function package and growing number of packages for specific purposes are available. Their website also provides nice and thick volumes of manual in pdf for more seriously minded users.

 

The R Package for multidimensional and spatial analysis

(http://bio.umontreal.ca/casgrain/en/labo/R)

The R package is a group of program with multiple modules for different multidimensional and spatial analysis procedures. The coauthor of the R package, Pierre Legendre, is also coauthor of renowned so-called ¡®greenbook¡¯ of quantitative ecology Numerical Ecology. All of the modules they implemented in the R package version 4 is introduced in their manual an discussed in depth in the greenbook. One drawback of the R package in this Bill Gates dominant world is there is no Windows version at this moment even if their website claimed they are in the process of having it as long as I can remember. The R package is free and currently work at the classical environment of OS X.

 

GSLIB (Geostatistical Software LIBrary. gslib.com)

GSLIB is the name of a directory containing the geostatistical software developed at Stanford and it¡¯s free. GSLIB has numerous modules run on DOS command environment with setup at separate parameter file. The original version was written in Fortran 77 and current version is written in Fortran 90 (v. 2.907). The user unfriendliness of the software has been resolved little bit by a commercially available GUI interface WinGslib ($200 for academic version). Some other good minds also wrote GUI interface for certain modules, for example, cyze.com wrote one for kriging and Gaussian simulation. This package is almost standard among public domain geostatistical software packages and accompanying rather extensive book GSLIB, Geostatistical Software Library and User¡¯s Guide by Clayton V. Deutsch & Andr¨¦ G. Journel (Oxford).

 

Other available packages

General packages with user-friendly GUI interface ¨C SPSS & SYSTAT

More Ecological purpose packages ¨C CANOCO & PC-ORD

Specialized in community indices ¨C EstimateS (Robert Collwell) & SPADE (Anne Chao)

 

¡¡

Institute for Environmental Genomics
University of Oklahoma
101 David L. Boren Blvd,   Norman, Oklahoma  73019
Ph (405) 325-6094  Fax (405) 325-7552
Email
ieg@rccc.ou.edu
The University of Oklahoma   
Copyright © The University of Oklahoma. All rights reserved. University of Oklahoma disclaimer.
The University of Oklahoma is an equal opportunity institution.