Coppe, Alessandro (2008) A bioinformatic and computational approach to regulation of genome function: integrated analysis of genome organization, promoter sequences and gene expression. [Ph.D. thesis]
Full text disponibile come:
Although much is known about gene expression regulation in both Prokaryotes and Eukaryotes, this complex and fascinating mechanism still remains to be fully elucidated. The relatively recent advent of high-throughput techniques for studying transcription has made available an invaluable amount of data that can be used for genome-wide analysis using bioinformatics approaches. These computational methods have now become an integrative part of biological research. The different topics of this thesis are related to the development and application of computational methodologies to better understand the basis of genomic gene expression regulation at different levels. A first level of investigation regarded the relationships among chromosomal structure, expression profile and functional characteristics, focusing on genomic organization and structure. For this task, REEF (REgionally Enriched Features) software has been developed, designed to identify genomic regions enriched in specific features, such as a class or group of genes homogeneous for expression and/or functional characteristics. REEF can be used to detect density variations of specific features along the genome sequence, for example genomic regions with significant enrichment of genes which are co-expressed, differentially expressed, or related to particular molecular functions. Local feature enrichment is calculated using test statistic based on the hypergeometric distribution applied genome-wide by sliding windows and false discovery rate is used for controlling multiplicity. REEF has been applied to the study of genomic distribution of tissue-specific genes and to the analysis of gene differentially expressed when comparing different myeloid cell lines. These analyses identified clusters of tissue-specific genes in the human genome and positional enrichment of hemopoietic functional module-related genes. The second level of investigation regarded gene expression regulation at promoter level. Unknown transcription factor binding sites might be detected by searching for shared sequence elements in upstream regulatory regions of genes with common biological function and/or similar expression profile. In fact, genes with similar expression are frequently co-regulated and genes with related function are often similarly expressed. New methodologies for the identification of regulatory motifs in human promoters were developed and tested. Since a drawback of this approach is the exceedingly high number of results, the use of biological knowledge both before and after application of automated pattern discovery allowed the definition of a “sheltered environment” enhancing the specificity of the computational analysis. COOP (Clustering of Overlapping Patterns) software for the extraction of sequence motifs was developed and used to analyze genomic sequences of 1 Kb upstream of 91 retina specific genes, identifying a set of putative regulative motifs, frequently occurring in retina promoter sequences. Most of them are localized in the proximal portion of promoters and tend to be less variable in central region than in lateral regions and some of them are similar to known regulatory sequences. The performances of COOP were further evaluated by simulation approaches and by applying it to a standard positive control dataset, proposed by Tompa and colleagues for systematic evaluation and comparison of pattern discovery software. A webtool for the prediction of functional elements in promoter sequences, MOST (MOtif Searching web Tool), has been applied to different datasets under various testing conditions in order to study the influence of specific search parameters on results. Two groups of promoter sequences containing known regulatory signals were used as positive control datasets: the public yeast benchmark dataset of Tompa and colleagues and a custom produced dataset of 37 human promoter sequences, subgroups of which contained some instances of one of nine different signals. The testing of performances of the method on different benchmark datasets gave quite positive results.
Aggiungi a RefWorks
BibliografiaI riferimenti della bibliografia possono essere cercati con Cerca la citazione di AIRE, copiando il titolo dell'articolo (o del libro) e la rivista (se presente) nei campi appositi di "Cerca la Citazione di AIRE".
Le url contenute in alcuni riferimenti sono raggiungibili cliccando sul link alla fine della citazione (Vai!) e tramite Google (Ricerca con Google). Il risultato dipende dalla formattazione della citazione.
Solo per lo Staff dell Archivio: Modifica questo record