McGill University, Description of Research Expertise Research Interests Research in my lab focuses on two major themes: gene regulation during brain development and the genetic basis of inner ear morphogenesis. Both areas of investigation use mouse models to study mechanisms of human congenital disorders including brain malformations and hearing loss. Key words: gene regulation, enhancers, Shh, brain development, hearing loss, cochlea, genetics, genomics, gene therapy Description of Research Enhancers regulating Sonic hedgehog Shh brain expression Cis-acting regulatory sequences are required for proper temporal and spatial control of gene expression.
Variation in gene expression is highly heritable and a significant determinant of human disease susceptibility. The diversity of human genetic diseases attributed, in whole or in part, to mutations in non-coding regulatory sequences is on the rise. Improvements in genome wide methods of associating genetic variation with human disease and predicting DNA with cis-regulatory potential are two of the major reasons for these recent advances.
Research in my laboratory employs genetic, genomic and biochemical approaches to uncover the cis and trans acting determinants of Shh expression in the mouse and human CNS. The temporal and spatial control of Shh expression from defined signaling centers is critical for establishing the identity of neurons in discrete positions along the dorsoventral axis of the neural tube. In the absence of Shh function, ventral midline development is perturbed resulting in holoprosencephaly HPE , a structural brain malformation, as well as neuronal patterning and path finding defects.
Central to the understanding of ventral neural tube development is how Shh transcription is regulated in the CNS. Previous work in my lab employed an enhancer trap assay to uncover six CNS specific enhancers distributed over kb whose combined activity covered most sites of Shh transcription in the mouse neural tube including the ventral forebrain Jeong et al.
These Shh regulatory elements were subsequently used as tools to further dissect Shh brain function Jeong et al. A particular highlight of these studies was our demonstration that mouse embryos lacking Shh in the prospective hypothalamus exhibit key features of septo-optic dysplasia SOD , a congenital disorder characterized by pituitary, optic nerve, and midline brain malformations Zhao et al.
We further revealed that prenatal ethanol exposure increases SOD risk through spatiotemporal perturbations in Shh signaling activity Kahn et al. These finding suggest that reduced Shh signaling underlies the pathogenesis of SOD and represents a later manifestation of a Shh dependent phenotype compared to HPE. Whole exome sequencing studies are currently underway to identify novel pathogenic mutations associated with human cases of SOD. Genomic architecture of Shh dependent cochlear morphogenesis A second focus of research in my laboratory addresses the genetic programs underlying inner ear morphogenesis.
Congenital malformations of the inner ear are a significant cause of hearing loss and vestibular dysfunction in humans. My research program is strongly motivated by the premise that a detailed understanding of the cellular and molecular mechanisms underlying inner ear development should not only improve our fundamental knowledge of how this complex structure is assembled, but may also profoundly improve the way inner ear disorders are treated in the future. The principal components for hearing cochlea and balance vestibulum are formed from ventral and dorsal outgrowths, respectively, of a common bilateral structure, the otocyst.
Organization of the inner ear into auditory and vestibular components is established early in development and is heavily influenced by surrounding tissues. The proximity of the otocyst to the hindbrain suggested that extracellular signals that pattern the CNS might also polarize the otic epithelium along its dorsoventral axis. Previous work in my laboratory determined the specific contributions of Shh and Wnt signaling pathways in promoting cochlear and vestibular development, respectively.
We demonstrated that Shh acts directly on the otic epithelium to regulate the outgrowth of the cochlear duct Riccomagno et al. More recent studies identified novel downstream effectors of Shh signaling that are active during cochlear duct outgrowth. In comparing the mRNA expression profile between control and Shh signaling mutants, we identified an intriguing set of genes with highly enriched functions in cochlear morphogenesis and sensory development. In addition, targeted mutations have been engineered in mice for a select number of Shh dependent genes, some of which result in hearing loss.
Gene therapy protocols are also being developed to treat hearing loss in these mice. Rotation Projects Several rotation projects related to the stated goals of the lab are currently available. The score of any DNA sequence window, having the same length as the matrix, is calculated by summing the corresponding nucleotide values from the PSSM. Such alignments are commonly generated using pattern discovery software such as MEME [ 42 ]. Once aligned, a matrix is created that reports the frequency of each nucleotide A, C, G, and T at each position of the alignment - the resulting matrix is called a position frequency matrix PFM.
The last step in obtaining a PSSM is to convert the PFM using a logarithmic function that weights the frequency of each nucleotide at each position by the frequency of that nucleotide in the genomic background in many software implementations the default background frequency is set to 0. The scores produced are analogous to binding energies [ 44 ] and can thus be considered a prediction of the strength of association of a TF protein with a specific DNA sequence. Active discussions are ongoing in the bioinformatics field about how the models can be improved in light of the increasing amount of TFBS data arising from ChIP-Seq studies [ 47 ].
Until recently, the number of TFs with PSSMs has increased slowly, but high-throughput laboratory approaches for the profiling of TF-bound sequences have resulted in a striking increase in the number and quality of PSSMs [ 48 , 50 ]. Such high-throughput experimental data typically arise from either in vivo ChIP, such as ChIP-Seq [ 51 ], or from in vitro protein binding studies, such as protein binding microarrays [ 52 ].
Gene Regulatory Sequences and Human Disease
For protein binding microarrays, double-stranded DNA of known sequence is affixed to the microarray surface and the adherence of a fluorescently labeled protein preparation of a TF or frequently just the DNA-binding domain from a TF is measured; the bound sequences are subsequently analyzed to determine the DNA sequence patterns targeted by the protein. The prediction of functional regulatory elements by PSSMs, although having good sensitivity most true positives are found , suffers from poor specificity many false positives are predicted [ 53 ]. For instance, a predicted TFBS may be buried in compact chromatin.
Thus, a prediction of a TFBS in isolation has limited relevance to the probability that a segment in the human genome will function as a cis -regulatory element. Approaches to reduce the specificity problems by filtering are discussed in the section below on 'Refining cis -regulatory predictions with filters'.
DNA Sequencing Fact Sheet | NHGRI
The same concepts underlying the use of PSSMs to predict TFBSs apply to most motif discrimination methods for sequence-specific regulatory elements, ranging from splice enhancers to translation start sites [ 54 — 56 ]. The reference sequence and the variant sequence are both scanned and scored by the PSSM model.
If the difference between observed scores is large, and at least one of the sequence isoforms is a known TFBS or is assigned a score that exceeds a user defined threshold for TFBS presence, the variation is predicted to have a functional impact. Such thresholds depend on the software used.
The impact of the variant is calculated as the reported difference between the two scores. The higher-scoring allele is predicted to be bound by the TF with greater affinity. In the following section we outline additional data that may be incorporated to improve TFBS prediction specificity. Similar allele comparison programs have been developed to predict altered microRNA target sites [ 63 ] and splicing elements [ 64 ].
As stated above, predictions of TFBSs are unreliable because of a high false positive prediction rate poor specificity. Predictions of cis- regulatory elements can be overlaid with genome annotations or experimental data to focus attention on the regions that are more likely to be functional [ 18 , 65 ]. An increase in specificity can be obtained by filtering predicted regulatory elements against complementary data, such as: i gene structure topology filters ; ii regions of sequence conservation phylogenetic footprinting ; iii TF-bound regions defined experimentally such as ChIP-Seq for TFs ; or iv structurally accessible or inaccessible regions such as ChIP-Seq for epigenetic marks or DNase I hypersensitivity analyses.
All the filters can be used individually or in combination, where it is functionally relevant; their main purpose is to add supporting evidence that a predicted regulatory element is functional. Although biologically relevant filters can dramatically increase the specificity of cis -regulatory element predictions, there may be a loss in sensitivity with the use of multiple filters, so it is recommended that a researcher assess results based on one filter before incorporating additional filters.
The activity of many cis- regulatory elements is spatially dependent see Figure 1 for locations of cis -regulatory elements. For instance, splice-regulating elements are positioned adjacent to splice sites reviewed in [ 57 ] and the target sequences for non-coding RNA, such as microRNAs, may be preferentially situated within 3' untranslated regions [ 66 ].
Specific types of TFBSs within the core and proximal promoters, such as the TATA box and the downstream proximal element, are topologically constrained to occupy a specific location relative to the transcription start site TSS [ 67 ]. Genome annotations and laboratory data can specify TSS locations, allowing researchers to focus on variants situated with functionally relevant spatial localization.
Existing annotations from high-throughput profiling of 5' capped RNA [ 68 ] and cDNA sequencing in genome annotation databases can delineate such regions. Increasingly, however, the annotation of exons is defined by RNA-Seq experiments applied to patient samples [ 69 ]. Each of these genomic data types can be retrieved as genomic positions from either a genome browser for example, using the Galaxy tools [ 26 ] or Ensembl BioMart [ 70 ] or from laboratory data, and should be chosen for their relevance to the type of regulatory element of interest.
The positions of topological annotations can be compared with the positions of the predicted regulatory elements, using data analysis tools such as those that the Galaxy system provides. Where topological features are proximal to or overlap with corresponding variant-altered regulatory element predictions, the variants may have greater reliability than predictions lacking such support.
Sequence conservation in the human genome can focus attention on regions with functional roles, a process termed phylogenetic footprinting. Using conservation scores based on multiple species alignments, such as the Phylogenetic P-values PhyloP [ 71 ] obtainable using the Galaxy system or directly from the UCSC genome annotation database , researchers can restrict attention to regions more likely to have sequence-specific function. Although there is evidence of functional regulatory sequences being conserved over moderate periods of evolution [ 72 ], there is also ample evidence of plasticity in regulatory sequences [ 73 ].
Conservation-based filters can enrich for functional sequences, but, as for all filters, functionally relevant sequences without sequence conservation may be lost [ 65 ].
If the position of a predicted variant-altered regulatory element overlaps a conserved region, then the cis -regulatory potential of the variant is considered to have functional support. Increasing access to high-throughput profiles of ChIP data is key to improved regulatory sequence studies. In the ChIP method, a specific antibody targeting a protein of interest is used to recover DNA sequences bound by the protein [ 51 ]. The nucleotide sequence of the recovered DNA is increasingly being identified by high-throughput sequencing, resulting in the procedure known as ChIP-Seq.
Regions containing a site bound by a targeted protein are identified in ChIP-Seq experiments as displaying a higher abundance of sequence reads recovered relative to a control set of data at a specific position in the genome. The method delivers two important advances for cis -regulatory element detection over past methodologies. First, it can be applied to detect TF or transcription co-activator bound regions across the entire genome of any species that has been sequenced [ 74 ]. Second, the results provide improved resolution of the boundaries for functional regulatory regions, providing the researcher with a refined search space for determining the active cis- regulatory element s in the region.
We focus here on two classes of ChIP-Seq experiments - those that profile interactions between a sequence-specific binding TF with DNA and those proteins that associate in a sequence-independent manner with regulatory regions discussed in the next section. Although useful, the study of genetic variants requires more precise mappings of individual TFBSs. As with the previous filters, for each predicted TFBS-altering variant, those overlapping a ChIP-Seq-delineated region can be considered of sufficient reliability to motivate further laboratory studies.
In addition to data pertaining to the sequence-specific binding of TFs, ChIP-Seq data can be obtained that delineate regions of a genome that are likely to contain elements involved in gene regulation. Such approaches may be based on antibodies that recognize specific epigenetic marks associated with cis- regulatory activity for example, histone modifications , or antibodies that recognize proteins, such as co-activators, associated with regulatory sequences that interact with DNA-bound TFs.
DNase I accessibility analysis likewise reveals regions with potential regulatory roles.
- Public Goods and the Public Sector;
- Lippincotts review for medical-surgical nursing certification.
- Mediators of epigenetic regulation of the genome.
- Influence of Endogenous Viral Sequences on Gene Expression;
Studies focused on individual histone modifications have shown that certain marks, such as H3K4me3, associate with active promoter regions, whereas others, such as H3K27me3, are pronounced at silent promoters [ 75 ]. Combined with cis -regulatory predictions, combinations of epigenetic marks can be used to more precisely delineate regulatory regions with potential active roles.
Focusing cis -regulatory element predictions proximal to or within epigenetic regions associated with active regulation improves the specificity of cis -regulatory element prediction [ 76 , 77 ]. The transcriptional co-activator p, a component of many regulatory protein associations, has been targeted in ChIP-Seq studies to define transcriptional enhancers -genomic regions containing multiple TFBSs that collectively enhance transcription [ 78 ].
Visel et al. They found that using ppredicted enhancer regions reduced the rate of false-positive predictions made by alternative methods by four-fold. Given the difficulty in obtaining high quality antibodies to proteins, and as the number of co-activators about 10 1 is small relative to the number of sequence-specific TFs about 10 3 , it is likely that ChIP-Seq data for the complete set of co-activators will become a preferred means of delineating likely regulatory regions active in each cell type.
In both classes of ChIP-Seq experiments, the defined regulatory regions from the ChIP-Seq studies can be used to select the predicted regulatory elements and variants most likely to affect cis -regulatory function. Use of such filters as outlined above for the prediction of regulatory variants is starting to emerge in the literature.
As interest rises with respect to the non-coding regulatory portions of the genome, we can expect to see more examples similar to the two we briefly outline below. A key paper has recently emerged, highlighting the potential power behind combining SNP identification and different lines of regulatory evidence. Ernst et al. Non-coding SNPs were found to significantly overlap with enhancers predicted by epigenetic analyses, and the SNP-containing enhancers tended to be detected in cell types relevant to the disease.