Software
BLAST:
This is a
standalone version of BLAST. It can be installed in your PCs and
run in the MSDOS mode. It is also available for other platforms,
including general Linux and Unix systems. The instructions and
options are available. Online version can be accessed by the
following link:
http://www.ncbi.nih.gov/BLAST/. This service is designed to
take protein and nucleic acid sequences and compare them against
a selection of NCBI databases. The BLAST algorithm was written
balancing speed and increased sensitivity for distant sequence
relationships. Instead of relying on global alignments, BLAST
emphasizes regions of local alignment to detect relationships
among sequences which share only isolated regions of similarity
(Altschul et al., 1990).
ClustalX:
Clustal X is
a new windows interface for the ClustalW multiple sequence
alignment program. It provides an integrated environment for
performing multiple sequence and profile alignments and
analyzing alignment results. The newest version can be
downloaded from the following site:
ftp://ftp-igbmc.u-strasbg.fr/pub/ClustalX/. ClustalX is
available for a number of different platforms including: SUN
Solaris, IRIX5.3 on Silicon Graphics, Digital UNIX on
DECStations, Microsoft Windows (32 bit) for PC's, Linux ELF for
x86 PC's and Macintosh PowerMac.
¡¡
CommOligo:
CommOligo is a newly developed software in this
Lab. It can be used to select optimal oligonucleotides for
microarray construction. The program picks up single or multiple
oligonucleotide probes for each sequence/gene. It uses global
alignment algorithms and considers multiple criteria for probe
design. The program is flexible for the user to choose values
for different parameters, such as sequence identity, the length
of maximum continuous stretches, and free energy. The length of
oligonucleotide can be from 8 to 128mer. However, different
lengths may need to establish different criteria based on
experimental data. An accessory software tool, Commoligo_PE may
help you to estimate those criteria if some experimental data
are available. It is available from the following link:
Download
Please cite or refer to the
publication for details:
Li X*,
He Z* and Zhou J. Selection of optimal
oligonucleotide probes for microarrays using multiple criteria,
global alignment and parameter estimation (*co-first
author). Nucleic Acids Research, 33:
6114-6123.
CommOligo2.0:
This software has been developed based on CommOligo in this Lab.
However, it is able to select group-specific oligonucleotide
probes for a group of highly homologous sequences. It
specifically targets functional genes with high similarities so
that it is very useful for the construction of functional gene
microarrays. It is available from the
following link:
Download
Primer3:
Primer3 is a
software tool commonly used for designing PCR primers. It can
design primers for multiple sequences using batch mode ¨C command
line style. It is also able to select primers for a single
sequence using a web-based interface. The program can be run on
Windows, Linux/Unix, and other systems. Primer3 is free and
offered on an "as-is", and use-at-your-own-risk basis. The
sources can be found on the following website:
http://frodo.wi.mit.edu/primer3/primer3_code.html. Web
interface can be accessed by the following link:
http://fokker.wi.mit.edu/cgi-bin/primer3/primer3_www.cgi.
Please cite or refer to the
publication for details:
Rozen S,
Skaletsky H (2000) Primer3 on the WWW for general users and
for biologist programmers. In: Krawetz S, Misener S (eds)
Bioinformatics Methods and Protocols: Methods in Molecular
Biology. Humana Press, Totowa, NJ, pp 365-386
MEGA:
Molecular Evolutionary Genetics Analysis is an integrated
software package for multisequence alignment, phylogenetic tree
construction and evolutionary analysis. The newest version can
be downloaded from the following site:
http://www.megasoftware.net/. MEGA is available for
Microsoft Windows (32 bit) for PC.
PAML:
Phylogenetic Analysis by Maximum Likelihood is a software
package for the evolutionary analysis of phylogenetic trees
using maximum likelihood methods. Analyses possible include
tests of the molecular clock hypothesis, rate heterogeneity
along lineages and among sites, ancestoral sequence prediction
and simulations. Nucleotide, amino acid and codon-based models
are all supported by the software. PAML can be downloaded from
the following site:
http://abacus.gene.ucl.ac.uk/software/paml.html. The
following platforms are supported: Microsoft Windows (32 bit)
for PC, Macintosh OS X, most UNIX/LINUX distros. (Yang 1997)
BioPerl:
BioPerl is a community effort to construct a set of standardized
Perl modules designed to simplify common bioinformatics
analyses. Tasks that can be carried out using BioPerl include:
report parsing (Blast, HMMer, etc.); manipulation of sequence
files (translation, format interconversion, etc.), sequence
alignments (identify mismatches, multisequence alignment, etc.)
and phylogenetic trees; evolution and population genetics
analyses (PAML, pairwise statistics, Ka/Ks calculation, etc.);
feature annotation. The BioPerl package can be downloaded from
the following sites:
http://www.bioperl.org;
http://search.cpan.org/dist/bioperl/. For Microsoft Windows
systems, BioPerl is also available through the ActiveSite Perl
ppm utility (http://www.activestate.com).
BioPerl is available for the following systems: Microsoft
Windows (32 bit) for PC, Apple Macintosh OS X and all UNIX/LINUX
distros.
Artemis/ACT:
Artemis and the Artemis Comparison Tool (ACT) are available from
the Wellcome Trust Sanger Institute (http://www.sanger.ac.uk/Software/).
Artemis is a free genome viewer and analysis tool that can
accept sequence files in a variety of formats, including
annotated GenBank and EMBL formats. ACT is a DNA sequence
comparison tool based on Artemis that is useful in comparative
genomics analyses. Both Artemis and ACT are written in Java and
may be run on the following platforms: Microsoft Windows (32
bit) for PC, Apple Macintosh OS X, all UNIX/LINUX distros.
PyMol:
PyMol is an open source molecular visualization program useful
of creating publication quality protein structure images (http://pymol.sourceforge.net/).
The software is available for the following platforms:
Microsoft Windows (32 bit) for PC, Apple
Macintosh OS X, all UNIX/LINUX distros.
General guideline for statistical software packages:
If you torture the data long
enough, Nature will confess. (Ronald Coase, 1991 Novel prize
lacerate in Economics).
Learning statistics and
knowing how to use different methods cannot or should not be
separated. Often there is more than one method can provide
answer and not all of them are necessary legitimate one for the
situation. Also almost all statistical analysis has various
options for improving and optimizing their performance.
Legitimate statistics should not be selected by its results but
by the careful assessment of its assumptions. Without knowing
what to do but just selecting default and/or without checking
whether data meet the assumptions, one¡¯s statistical analysis
should not be trusted. In much simpler way, the interpretation
of the results of analysis cannot be done at least in a
legitimate way without understanding the specific statistical
method. Certain statistical procedure can be learned rather
easier if one is equipped with sound foundation of statistics,
such as normality and independence of data, statistical test,
few basic distributions, central limit theorem and so on. Thus
it is always strongly advised to learn fundamental statistics
before causing problems by torturing statistics to provide
answers one would like to see.
SAS
(SAS Institute, Cary, NC. sas.com)
SAS has been developed its
name as the statistical software package for general statistical
analysis. The current version is 9.1.3 and Mac version
development has stopped at version 6. SAS has many modules such
as SAS/STAT for statistical analysis, SAS/INSIGHT for
exploratory data analysis, SAS/IML for matrix programming and so
on. SAS Institute also provides solutions for different areas
and their recent addition is SAS® Microarray in that
microarray data analysis is stream-lined in very intact way.
Most academic and research institutes provide site license for
their users. SAS is very powerful and trusty, but not very
intuitive to learn at first since SAS/STAT is still running with
complete program format which user must define the location of
input file, data format, procedure (proc) and output with
various available options. One of easier introductory book would
be Applied Statistics and the SAS Programming Language by
Ronald P. Cody & Jeffrey K. Smith (Prentice Hall). One of
the recent major addition to SAS system is SAS®
Enterprise MinerTM, which enables streamline of data
mining (or machine learning) procedure within rather intuitive
programming environment.
R
(The R Project for Statistical Computing. r-project.org)
R is a language and
environment for statistical computing and graphics. It is a
GNU project which is
similar to the S language and environment. There are some
important differences, but much code written for S runs
unaltered under R. R provides a wide variety of statistical
(linear and nonlinear modelling, classical statistical tests,
time-series analysis, classification, clustering, and graphical
techniques, and is highly extensible. The S language is often
the vehicle of choice for research in statistical. R is
available as Free Software under the terms of the
Free Software Foundation's
GNU General Public License in source code form. It compiles
and runs on a wide variety of UNIX platforms and similar systems
(including FreeBSD and Linux), Windows and MacOS.
From the
novice user¡¯s point-of-view, R is very versatile since a lot of
advanced users/developers implement their new statistical
methods in R and provides as a format of package. A lot of
common statistical methods are included as base and function
package and growing number of packages for specific purposes are
available. Their website also provides nice and thick volumes of
manual in pdf for more seriously minded users.
The R Package for
multidimensional and spatial analysis
(http://bio.umontreal.ca/casgrain/en/labo/R)
The R package is a group of
program with multiple modules for different multidimensional and
spatial analysis procedures. The coauthor of the R package,
Pierre Legendre, is also coauthor of renowned so-called
¡®greenbook¡¯ of quantitative ecology Numerical Ecology.
All of the modules they implemented in the R package version 4
is introduced in their manual an discussed in depth in the
greenbook. One drawback of the R package in this Bill Gates
dominant world is there is no Windows version at this moment
even if their website claimed they are in the process of having
it as long as I can remember. The R package is free and
currently work at the classical environment of OS X.
GSLIB (Geostatistical
Software LIBrary. gslib.com)
GSLIB is the name of a
directory containing the geostatistical software developed at
Stanford and it¡¯s free. GSLIB has numerous modules run on DOS
command environment with setup at separate parameter file. The
original version was written in Fortran 77 and current version
is written in Fortran 90 (v. 2.907). The user unfriendliness of
the software has been resolved little bit by a commercially
available GUI interface WinGslib ($200 for academic version).
Some other good minds also wrote GUI interface for certain
modules, for example, cyze.com wrote one for kriging and
Gaussian simulation. This package is almost standard among
public domain geostatistical software packages and accompanying
rather extensive book GSLIB, Geostatistical Software Library
and User¡¯s Guide by Clayton V. Deutsch & Andr¨¦ G. Journel (Oxford).
Other available
packages
General packages with
user-friendly GUI interface ¨C SPSS & SYSTAT
More Ecological purpose
packages ¨C CANOCO & PC-ORD
Specialized in community
indices ¨C EstimateS (Robert Collwell) & SPADE (Anne Chao)
¡¡ |