|
Establishing experimental criteria for probe
design and developing new computer programs
Microarrays fabricated with the genes encoding key, functional
enzymes involved in various biogeochemical cycling processes are
referred to as functional gene arrays (FGAs). One of the greatest
challenges in using FGAs for detecting functional genes and/or
microorganisms in the environment is to design oligonucleotide
probes specific to target genes/microorganisms because many
sequences targeted in environmental studies are highly homologous.
To tackle this problem, we have experimentally established the probe
design criteria by considering sequence homology, free energy and
sequence stretches. This paper was published in Applied and
Environmental Microbiology (AEM) and was listed as one of the top 20
papers most requested in 2005 by AEM. We have further improved the
probe design criteria by showing that very little
cross-hybridization was observed when a probe has ¡Ü 90% sequence
identity, ¡Ý -35 Kcal/mol of free energy, and ¡Ü 20 base stretches
with its non-targets. Our recent tests with FGA-II (>24,000 gene
probes) showed that the probe specificity can be predicted very well
based on these criteria.
Once the criteria are established, the next critical issue is the
development of a computer program to implement probe design
strategies. Due to the highly homologous nature of the gene
sequences of interest for community studies, no commercial software
or freeware was available. Thus, over the past two years, we have
developed, tested, and applied a new software tool (CommOligo) to
design oligonucleotide probes for microarrays. There are several
advantages for the newly developed software. First, this program was
specifically designed by considering the challenges of selecting
specific probes based on many homologous sequences of each
functional gene category. Thus it is well suited for microbial
community study and microbial detection. To our knowledge, this is
the first computational program for designing community-wide gene
probes. It can also be used for designing probes for whole genome
arrays. Second, this program implements novel global alignment
algorithms, and simultaneously uses multiple criteria (e.g. sequence
identity, hybridization free energy (¦¤G), and continuous stretch to
predict oligonucleotide specificity. In addition, the program has a
unique feature to design group-specific probes for a group of highly
homologous sequences.
For monitoring microbial populations in the environment, it is ideal
to develop hierarchical oligonucleotide probes (50mer) for different
phylogenetic groups of genes/microorganisms at different levels of
specificity (e.g., strains, species, genera, and families), and for
individual sequences/organisms as well. Such hierarchical probes
will allow us to simultaneously detect both closely and distantly
related genes/populations so that we will not miss some important
distantly related populations. However, the currently developed
program could not meet such requirements. Thus, we are developing
computational tools capable of designing hierarchical probes based
on the CommOligo program.
Development of comprehensive
functional gene arrays (FGAs). Although many technical issues
regarding microarray technology have been solved, one of the
critical bottlenecks that remain is to design FGAs containing probes
appropriate for studying the microbial communities of interest. With
the newly developed computer program, we have recently finished the
construction of the second generation of FGAs for environmental
studies. The probes designed for this microarray encompass the
variation in >10,000 known microbial functional genes involved in
nitrogen (e.g. nitrification, denitirification and nitrogen
fixation), carbon (e.g. carbon dioxide fixation and cellulose
degradation, methane production and oxidation) and sulfur (e.g.
dissimilatory sulfite reduction) cycling processes, organic
contaminant degradation, and metal resistance and reduction. To our
knowledge, this is the most comprehensive arrays in the world
available for environmental studies. Now, we are developing more
comprehensive FGAs by including more functional gene groups.
We
have also developed other novel high-throughput genomic technologies
such as community genome arrays (CGAs). We have evaluated the
specificity, sensitivity, quantification and potential application
for environmental samples under a variety of conditions. Our results
indicate that microarray-based genomic technology has potential as a
specific, sensitive and quantitative tool in detecting
microorganisms in environments and revealing species relationships
among different bacteria.
Novel approaches for increasing microarray
hybridization sensitivity
One of the main challenges in
using microarrays to analyze microbial communities in natural
settings is current detection sensitivities are not sufficient for
detecting the majority of microbial populations in environmental
samples. Thus we have initiated the development of new technologies
for increasing detection sensitivity. First we have optimized the
labeling and hybridization systems. With such optimization, 50-100
fold sensitivity can be increased. Second we have developed
nanotechnologies by using nano-particles for increasing signal
detection on microarrays. Our results showed that 10-50 sensitivity
can be increased. In addition, we have developed a novel approach
and strategy for representatively, quantitatively amplifying whole
microbial communities for microarray-based detection. With this
technology, as low as two bacterial cells can be detected.
Application of this technology to various environmental samples
showed that this technology can provide reliable and quantitative
detection of microbial populations in environmental samples. This is
the first time that microarrays have been used to visualize a
complete picture of microbial communities in natural settings with
low biomass. The development of such technologies makes it possible
to utilize microarrays for analysis of environmental samples. This
approach will also be very useful for addressing questions
concerning microbial communities associated with human health, plant
and animal quarantine (e.g., pathogen detection), plant ecology
(e.g., rhizosphere populations), animal productivity and health
(e.g., intestinal and rumen populations), forestry, oceanography,
fisheries, ecology, biodiversity discovery and management (e.g.,
pharmaceutical discovery), etc., as microbial communities play
important roles in each of these areas and the available natural
community biomass is often very restricted.
mRNA-based detection in
environmental samples. One big problem in the detection of
activities in environmental samples by microarray hybridization is
also to obtain sufficient amount of mRNAs for analysis. We have
developed and evaluated an mRNA amplification strategy for improving
detection sensitivity of microbial activity using a modified T7 RNA
polymerase-based approach to amplify prokaryotic mRNAs for
microarray analysis. Our results indicated that mRNA can be
successfully amplified by this approach with community mRNAs and
enough mRNAs can be obtained for microarray hybridization from
contaminated groundwater samples. This development of such
technology makes it possible to analyze microbial community
activities in environmental samples.
Novel surface chemistry for microarray
fabrication
For constructing microarrays with
short oligonucleotides (<25bp), one end of an olignucleotide probe
must be modified for attachment to glass slides, which generally
costs $5-20 per modification. Such cost is prohibitive when high
density of oligo arrays is considered. Thus we have also developed a
novel surface coating chemistry for fabricating unmodified
oligonucleotides on glass slides. This novel chemistry has
significant advantages over the currently used approach in terms of
detection sensitivity, dynamic range, versatility, background and
cost. Our experimental results showed that single base difference
between probe and target DNA can be easily differentiated with the
new surface chemistry. Also this type of chemistry can be used for
fabricating DNAs and proteins. A patent is in pending and this
technology is licensed to a company for commercialization.
Protein arrays
Protein microarrays are becoming
an important tool in proteomics, drug discovery and disease
diagnosis. Fabrication of protein microarrays on planar surfaces is
a great challenge due to protein denaturation and cross-activities.
We have developed three-dimensional nanofilm-based slides for
protein array fabrication. Our results showed that very reliable
antibody-antigen interactions can be detected with the new surface
chemistry. We have also developed a novel hydrogel-based approach
for protein array fabrication. Our results showed that the activity
and selectivity of the hydrogel-based protein arrays are
advantageous over conventional technologies. A patent on this
technology is in preparation.
New methods for sample preparation
One of the bottlenecks in using
microarrays for environmental studies is the sample preparation.
Thus, besides the development of microarrays-based genomic
technology, my laboratory has also developed and optimized molecular
methods for extracting high-quality nucleic acids from environmental
samples. We were the first to be able to recover high-quality intact
mRNAs and DNA simultaneously from a variety of soils and sediments,
which are extremely challenging. My laboratory was also the first to
discern the problems of heteroduplexes and PCR-induced mutations in
the 16S rRNA gene-based cloning approach and to develop approaches
to minimize such artifacts. These studies are important in molecular
microbial ecology because PCR-based cloning approach is the most
widely used and powerful one for analyzing biological samples.
To obtain a comprehensive view of
microbial communities, it is preferable to have larger inserts up to
200 kb, but obtaining high molecular weight DNA with sufficient
purity is very challenging. Thus, we are developing new technologies
for recovering extremely high molecular weight DNA from
environmental samples. Our results showed that more than 200 kb
fragments can be isolated and cloned.
The genomic technologies we have
developed are important not only to environmental studies but also
to the general field of microorganism detection and
characterization. Our technology innovations will greatly advance
researcher¡¯s capabilities to analyze microbial communities in the
environment. These technologies will be important in addressing
microbial problems associated with pathogen detection, microbial
ecology of infectious diseases, plant growth, animal health,
biodiversity, pharmaceutical discovery, bioprocessing of industrial
products, waste-water treatment, and bioremediation of contaminants,
because microbial communities are important in each of these areas.
Development of new methods for measuring codon
usage bias
Many methods for measuring
synonymous codon usage bias require standard codon reference.
However, it is difficult to determine the standard codon reference
when little biological information is available from an organism of
interest. Thus, a novel information theory-based method for
measuring synonymous codon usage bias was developed. This method
does not require standard codon reference. Analysis showed that this
method is advantageous over other existing methods.
Development of novel approach for network
analysis
Although genomic technologies such
as microarrays provide powerful tools for identifying cellular
interaction networks consisting many individual modules, defining
such modules without ambiguity is very difficult because all current
methods rely on arbitrarily chosen thresholds and hence the results
are subjective. Thus we have developed a novel, reliable, sensitive
and robust approach for automatically identifying functional modules
using a mathematically defined threshold predicted by random matrix
theory (RMT). Applying this approach to microarray data from yeast,
human, E. coli and S. oneidensis demonstrated that it correctly
identifies functional modules with the expected properties
consistent with general network theory. Experimental validation on
the predicted functions of 10 unknown genes from yeast and
Shewanella indicated that this approach is useful for predicting the
functions of unknown genes. This approach will be ideal for
analyzing high throughput genomics data for modular network
identification and gene function prediction.
¡¡
|