Vai ai contenuti. | Spostati sulla navigazione | Spostati sulla ricerca | Vai al menu | Contatti | Accessibilità

| Crea un account

Apostolico, Alberto - Comin, Matteo - Parida, Laxmin (2006) Mining, compressing and classifying with extensible motifs. [Articolo di periodico (online)]

Full text disponibile come:

[img]
Anteprima
Documento PDF
282Kb

Per gentile concessione di: http://www.almob.org/content/1/1/4

Abstract (inglese)

Background

Motif patterns of maximal saturation emerged originally in contexts of pattern discovery in biomolecular sequences and have recently proven a valuable notion also in the design of data compression schemes. Informally, a motif is a string of intermittently solid and wild characters that recurs more or less frequently in an input sequence or family of sequences. Motif discovery techniques and tools tend to be computationally imposing, however, special classes of "rigid" motifs have been identified of which the discovery is affordable in low polynomial time.

Results

In the present work, "extensible" motifs are considered such that each sequence of gaps comes endowed with some elasticity, whereby the same pattern may be stretched to fit segments of the source that match all the solid characters but are otherwise of different lengths. A few applications of this notion are then described. In applications of data compression by textual substitution, extensible motifs are seen to bring savings on the size of the codebook, and hence to improve compression. In germane contexts, in which compressibility is used in its dual role as a basis for structural inference and classification, extensible motifs are seen to support unsupervised classification and phylogeny reconstruction.

Conclusion

Off-line compression based on extensible motifs can be used advantageously to compress and classify biological sequences.


Statistiche Download - Aggiungi a RefWorks
Tipo di EPrint:Articolo di periodico (online)
Anno di Pubblicazione:2006
Parole chiave (italiano / inglese):motifs, biomolecular sequences
Settori scientifico-disciplinari MIUR:Area 05 - Scienze biologiche > BIO/13 Biologia applicata
Area 01 - Scienze matematiche e informatiche > INF/01 Informatica
Struttura di riferimento:Dipartimenti > Dipartimento di Ingegneria dell'Informazione
Codice ID:1211
Depositato il:09 Dic 2008
Simple Metadata
Full Metadata
EndNote Format

Download statistics

Solo per lo Staff dell Archivio: Modifica questo record