Go to the content. | Move to the navigation | Go to the site search | Go to the menu | Contacts | Accessibility

| Create Account

Marchesin, Stefano (2020) Developing unsupervised knowledge-enhanced models to reduce the semantic Gap in information retrieval. [Ph.D. thesis]

Full text disponibile come:

PDF Document

Abstract (italian or english)

In this thesis we tackle the semantic gap, a long-standing problem in Information Retrieval(IR). The semantic gap can be described as the mismatch between users’ queries and the way retrieval models answer to such queries. Two main lines of work have emerged over the years to bridge the semantic gap: (i) the use of external knowledge resources to enhance the bag-of-words representations used by lexical models, and (ii) the use of semantic models to perform matching between the latent representations of queries and documents. To deal with this issue, we first perform an in-depth evaluation of lexical and semantic models through different analyses. The objective of this evaluation is to understand what features lexical and semantic models share, if their signals are complementary, and how they can be combined to effectively address the semantic gap. In particular, the evaluation focuses on (semantic)
neural models and their critical aspects. Then, we build on the insights of this evaluation to develop lexical and semantic models addressing the semantic gap. Specifically, we develop unsupervised models that integrate knowledge from external resources, and we evaluate them for the medical domain – a domain with a high social value, where the semantic gap is
prominent, and the large presence of authoritative knowledge resources allows us to explore effective ways to leverage external knowledge to address the semantic gap. For lexical models, we propose and evaluate several knowledge-based query expansion and reduction techniques. These query reformulations are used to increase the probability of retrieving relevant documents by adding to or removing from the original query highly specific terms.
Regarding semantic models, we first analyze the limitations of the knowledge-enhanced neural models presented in the literature. Then, to overcome these limitations, we propose SAFIR, an unsupervised knowledge-enhanced neural framework for IR. The representations learned within this framework are optimized for IR and encode linguistic features that are relevant to address the semantic gap.

Statistiche Download
EPrint type:Ph.D. thesis
Tutor:Agosti, Maristella
Data di deposito della tesi:30 November 2020
Anno di Pubblicazione:30 November 2020
Key Words:Unsupervised Information Retrieval Medical Information Retrieval Deep Learning Representation Learning
Settori scientifico-disciplinari MIUR:Area 09 - Ingegneria industriale e dell'informazione > ING-INF/05 Sistemi di elaborazione delle informazioni
Struttura di riferimento:Dipartimenti > Dipartimento di Ingegneria dell'Informazione
Codice ID:13174
Depositato il:12 May 2021 08:52
Simple Metadata
Full Metadata
EndNote Format

Download statistics

Solo per lo Staff dell Archivio: Modifica questo record