Vai ai contenuti. | Spostati sulla navigazione | Spostati sulla ricerca | Vai al menu | Contatti | Accessibilità

| Crea un account

Maistro, Maria (2018) Exploiting user signals and stochastic models to improve information retrieval systems and evaluation. [Tesi di dottorato]

Full text disponibile come:

[img]
Anteprima
Documento PDF (tesi di dottorato) - Versione accettata
9Mb

Abstract (inglese)

The leitmotiv throughout this thesis is represented by IR evaluation. We discuss different issues related to effectiveness measures and novel solutions that we propose to address these challenges. We start by providing a formal definition of utility-oriented measurement of retrieval effectiveness, based on the representational theory of measurement. The proposed theoretical framework contributes to a better understanding of the problem complexities, separating those due to the inherent problems in comparing systems, from those due to the expected numerical properties of measures. We then propose AWARE, a probabilistic framework for dealing with the noise and inconsistencies introduced when relevance labels are gathered with multiple crowd assessors. By modeling relevance judgements and crowd assessors as sources of uncertainty, we directly combine the performance measures computed on the ground-truth generated by each crowd assessor, instead of adopting a classification technique to merge the labels at pool level. Finally, we investigate evaluation measures able to account for user signals. We propose a new user model based on Markov chains, that allows the user to scan the result list with many degrees of freedom. We exploit this Markovian model in order to inject user models into precision, defining a new family of evaluation measures, and we embed this model as objective function of an LtR algorithm to improve system performances.

Abstract (italiano)

La valutazione in Information Retrieval (IR) rappresenta il leitmotiv di questa tesi, in cui sono analizzati diversi problemi legati alle misure di efficacia in IR e le soluzioni proposte per risolvere tali problemi. Inizialmente viene proposta una definizione formale di misure di efficacia di IR orientate all’utilità dell’utente. Tale definizione è costruita a partire dalla teoria rappresentazionale della misura e la configurazione teorica presentata contribuisce alla migliore comprensione delle difficoltà relative al confronto dei sistemi, separandole da quelle relative alle proprietà numeriche delle misure. Successivamente è descritto AWARE, un approccio probabilistico per controllare il rumore e le inconsistenze introdotte quando i giudizi di rilevanza sono raccolti tramite piattaforme di crowd sourcing. Invece di adottare tecniche di classificazione per combinare i giudizi di rilevanza raccolti da diversi crowd worker a livello di pool, i giudizi di rilevanza e i crowd worker stessi sono considerati come sorgenti di probabilità, permettendo di combinare direttamente le misure di valutazione calcolate sulle diverse ground truth generate da ogni crowd worker. Per concludere, vengono illustrate misure di valutazioni capaci di tenere in considerazione le interazioni tra sistemi e utenti. Viene proposto un modello di utente basato su processi di tipo Markoviano che permette di descrivere il comportamento degli utenti con molti gradi di libertà. Tale modello è utilizzato per definire una nuova famiglia di misure di valutazione costruite a partire da precision, ed è incluso nella funzione obbiettivo di un algoritmo di Learning to Rank (LtR) per migliorare le prestazioni del sistema.

Statistiche Download
Tipo di EPrint:Tesi di dottorato
Relatore:Ferro, Nicola
Dottorato (corsi e scuole):Ciclo 30 > Corsi 30 > INGEGNERIA DELL'INFORMAZIONE
Data di deposito della tesi:14 Gennaio 2018
Anno di Pubblicazione:14 Gennaio 2018
Parole chiave (italiano / inglese):Information Retrieval, effectiveness, relevance assessment, markov chain, evaluation, learning to rank, user model
Settori scientifico-disciplinari MIUR:Area 09 - Ingegneria industriale e dell'informazione > ING-INF/05 Sistemi di elaborazione delle informazioni
Struttura di riferimento:Dipartimenti > Dipartimento di Ingegneria dell'Informazione
Codice ID:10819
Depositato il:31 Ott 2018 09:47
Simple Metadata
Full Metadata
EndNote Format

Bibliografia

I riferimenti della bibliografia possono essere cercati con Cerca la citazione di AIRE, copiando il titolo dell'articolo (o del libro) e la rivista (se presente) nei campi appositi di "Cerca la Citazione di AIRE".
Le url contenute in alcuni riferimenti sono raggiungibili cliccando sul link alla fine della citazione (Vai!) e tramite Google (Ricerca con Google). Il risultato dipende dalla formattazione della citazione.

Abad, A. (2017). Controlling the Effect of Noisy Annotations in Crowdsourcing NLP Tasks. PhD thesis, University of Trento. Cerca con Google

Abad, A., Nabi, M., and Moschitti, A. (2017). Autonomous Crowdsourcing Through Human- Machine Collaborative Learning. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’17, pages 873–876, New York, NY, USA. ACM. Cerca con Google

Agichtein, E., Brill, E., and Dumais, S. (2006a). Improving Web Search Ranking by Incorporating User Behavior Information. In Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’06, pages 19–26, New York, NY, USA. ACM. Cerca con Google

Agichtein, E., Brill, E., Dumais, S., and Ragno, R. (2006b). Learning User Interaction Models for Predicting Web Search Result Preferences. In Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’06, pages 3–10, New York, NY, USA. ACM. Cerca con Google

Allegretti, M., Moshfeghi, Y., Hadjigeorgieva, M., Pollick, F. E., Jose, J. M., and Pasi, G. (2015). When Relevance Judgement is Happening?: An EEG-based Study. In Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’15, pages 719–722, New York, NY, USA. ACM. Cerca con Google

Alonso, O. (2013). Implementing Crowdsourcing-based Relevance Experimentation: an Industrial Perspective. Information Retrieval, 16(2):101–120. Cerca con Google

Alonso, O. and Mizzaro, S. (2009). Can we Get Rid of TREC Assessors? Using Mechanical Turk for Relevance Assessment. In Proceedings of the SIGIR 2009 Workshop on the Future of IR Evaluation, volume 15, page 16. Cerca con Google

Alonso, O. and Mizzaro, S. (2012). Using Crowdsourcing for TREC Relevance Assessment. Information Processing & Management, 48(6):1053–1066. Cerca con Google

Alonso, O., Rose, D. E., and Stewart, B. (2008). Crowdsourcing for Relevance Evaluation. SIGIR Forum, 42(2):9–15. Cerca con Google

Amigó, E., Gonzalo, J., Artiles, J., and Verdejo, M. F. (2009). A Comparison of Extrinsic Clustering Evaluation Metrics Based on Formal Constraints. Information Retrieval, 12(4):461–486. Cerca con Google

Amigó, E., Gonzalo, J., and Verdejo, F. (2013). A General Evaluation Measure for Document Organization Tasks. In Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’13, pages 643–652, New York, NY, USA. ACM. Cerca con Google

Angelini, M., Ferro, N., Santucci, G., and Silvello, G. (2014). VIRTUE: A Visual Tool for Information Retrieval Performance Evaluation and Failure Analysis. Journal of Visual Languages & Computing (JVLC), 25(4):394–413. Cerca con Google

Aslam, J. A., Yilmaz, E., and Pavlu, V. (2005). A Geometric Interpretation of R-precision and its Correlation with Average Precision. In Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’05, pages 573–574, New York, NY, USA. ACM. Cerca con Google

Bailey, P., Craswell, N., Soboroff, I., Thomas, P., de Vries, A. P., and Yilmaz, E. (2008). Relevance Assessment: Are Judges Exchangeable and Does It Matter. In Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’08, pages 667–674, New York, NY, USA. ACM. Cerca con Google

Bashir, M., Anderton, J., Wu, J., Ekstrand-Abueg, M., Golbus, P. B., Pavlu, V., and Aslam, J. A. (2013). Northeastern University Runs at the TREC12 Crowdsourcing Track. In The Twenty-First Text REtrieval Conference Proceedings, TREC ’12. National Institute of Standards and Technology (NIST), Special Publication 500-298, Washington, USA. Cerca con Google

Billingsley, P. (1995). Probability and Measure. John Wiley & Sons, New York, USA, 3rd edition. Cerca con Google

Bjørner, S. and Ardito, S. C. (2003). Online Before the Internet: Early Pioneers Tell their Stories. Searcher: The Magazine for Database Professionals, 11(6). Cerca con Google

Blanco, R., Halpin, H., Herzig, D. M., Mika, P., Pound, J., Thompson, H. S., and Tran Duc, T. (2011). Repeatable and Reliable Search System Evaluation Using Crowdsourcing. In Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’11, pages 923–932, New York, NY, USA. ACM. Cerca con Google

Bollman, P. (1984). Two Axioms for Evaluation Measures in Information Retrieval. In van Rijsbergen, C. J., editor, Proceedings of the Third Joint BCS and ACM Symposium on Research and Development in Information Retrieval, pages 233–245. Cambridge University Press, UK. Cerca con Google

Broder, A. (2002). A Taxonomy of Web Search. SIGIR Forum, 36(2):3–10. Cerca con Google

Buckley, C. and Voorhees, E. M. (2000). Evaluating Evaluation Measure Stability. In Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’00, pages 33–40, New York, NY, USA. ACM. Cerca con Google

Buckley, C. and Voorhees, E. M. (2004). Retrieval Evaluation with Incomplete Information. In Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’04, pages 25–32, New York, NY, USA. ACM. Cerca con Google

Buckley, C. and Voorhees, E. M. (2005). Retrieval System Evaluation. In TREC. Experiment and Evaluation in Information Retrieval, pages 53–78. MIT Press, Cambridge (MA), USA. Cerca con Google

Burges, C., Shaked, T., Renshaw, E., Lazier, A., Deeds, M., Hamilton, N., and Hullender, G. (2005). Learning to Rank Using Gradient Descent. In Proceedings of the 22Nd International Conference on Machine Learning, ICML ’05, pages 89–96, New York, NY, USA. ACM. Cerca con Google

Burges, C. J. (2010). From RankNet to LambdaRank to LambdaMART: An Overview. Technical report. Cerca con Google

Burges, C. J., Ragno, R., and Le, Q. V. (2007). Learning to Rank with Nonsmooth Cost Functions. In Advances in Neural Information Processing Systems, pages 193–200. MIT Press. Cerca con Google

Burgin, R. (1992). Variations in Relevance Judgments and the Evaluation of Retrieval Performance. Information Processing & Management, 28(5):619–627. Cerca con Google

Burnham, K. P. and Anderson, D. R. (2002). Model Selection and Multimodel Inference. A Practical Information-Theoretic Approach. Springer-Verlag, Heidelberg, Germany, 2nd edition. Cerca con Google

Busin, L. and Mizzaro, S. (2013). Axiometrics: An Axiomatic Approach to Information Retrieval Effectiveness Metrics. In Proceedings of the 2013 Conference on the Theory of Information Retrieval, ICTIR ’13, pages 8:22–8:29, New York, NY, USA. ACM. Cerca con Google

Büttcher, S., Clarke, C. L. A., and Cormack, G. V. (2016). Information Retrieval: Implementing and Evaluating Search Engines. Mit Press. Cerca con Google

Capannini, G., Lucchese, C., Nardini, F. M., Orlando, S., Perego, R., and Tonellotto, N. (2016). Quality Versus Efficiency in Document Scoring with Learning-to-rank Models. Information Processing & Management, 52(6):1161 – 1177. Cerca con Google

Carterette, B. (2011). System Effectiveness, User Models, and User Utility: A Conceptual Framework for Investigation. In Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’11, pages 903–912, New York, NY, USA. ACM. Cerca con Google

Carterette, B., Bennett, P. N., Chickering, D. M., and Dumais, S. T. (2008). Here or There: Preference Judgments for Relevance. In Advances in Information Retrieval. Proceedings of the 30th European Conference on IR Research, ECIR’08, pages 16–27, Berlin, Heidelberg. Springer-Verlag. Cerca con Google

Carterette, B., Kanoulas, E., and Yilmaz, E. (2012). Advances on the Development of Evaluation Measures. In Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’12, pages 1200–1201, New York, NY, USA. ACM. Cerca con Google

Carterette, B. and Soboroff, I. (2010). The Effect of Assessor Error on IR System Evaluation. In Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’10, pages 539–546, New York, NY, USA. ACM. Cerca con Google

Carterette, Ben and Pavlu, Virgiliu and Fang, Hui and Kanoulas, Evangelos (2009). Million Query Track 2009 Overview. In The Eighteenth Text Retrieval Conference, TREC ’09, Gaithersburg, MD, USA: Department of Commerce, National Institute of Standards and Technology. NIST Special Publication. Cerca con Google

Chapelle, O., Joachims, T., Radlinski, F., and Yue, Y. (2012). Large-scale Validation and Analysis of Interleaved Search Evaluation. ACM Transaction on Information Systems (TOIS), 30(1):6:1–6:41. Cerca con Google

Chapelle, O., Metlzer, D., Zhang, Y., and Grinspan, P. (2009). Expected Reciprocal Rank for Graded Relevance. In Proceedings of the 18th ACM Conference on Information and Knowledge Management, CIKM ’09, pages 621–630, New York, NY, USA. ACM. Cerca con Google

Chierichetti, F., Kumar, R., and Raghavan, P. (2011). Optimizing Two-dimensional Search Results Presentation. In Proceedings of the Fourth ACM International Conference on Web Search and Data Mining, WSDM ’11, pages 257–266, New York, NY, USA. ACM. Cerca con Google

Chuklin, A., Serdyukov, P., and de Rijke, M. (2013). Click Model-based Information Retrieval Metrics. In Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’13, pages 493–502, New York, NY, USA. ACM. Cerca con Google

Clarke, C. L., Craswell, N., Soboroff, I., and Ashkan, A. (2011). A Comparative Analysis of Cascade Measures for Novelty and Diversity. In Proceedings of the Fourth ACM International Conference on Web Search and Data Mining, WSDM ’11, pages 75–84, New York, NY, USA. ACM. Cerca con Google

Cleverdon, C. W. (1960). Report on the First Stage of an Investigation into the Comparative Efficiency of Indexing Systems. Technical report, Aslib Cranfield Research Project, Cranfield, England. Cerca con Google

Cleverdon, C. W. (1962). Report on the Testing and Analysis of an Investigation into the Comparative Efficiency of Indexing Systems. Technical report, Aslib Cranfield Research Project, Cranfield, England. Cited on pages 18 and 31. Cerca con Google

Cleverdon, C. W. and Mills, J. and Keen, M. (1966). Factors Determining The Performance of Indexing Systems. Volume 2, Test results. Technical report, Aslib Cranfield Research Project, Cranfield, England. Cerca con Google

Clough, P., Sanderson, M., Tang, J., Gollins, T., and Warner, A. (2013). Examining the Limits of Crowdsourcing for Relevance Assessment. IEEE Internet Computing, 17(4):32–38. Cerca con Google

Collins-Thompson, K. and Callan, J. (2005). Query Expansion Using Random Walk Models. In Proceedings of the 14th ACM International Conference on Information and Knowledge Management, CIKM ’05, pages 704–711, New York, NY, USA. ACM. Cerca con Google

Cooper, W. S. (1968). Expected Search Length: A Single Measure of Retrieval Effectiveness Based on the Weak Ordering Action of Retrieval Systems. American Documentation, 19(1):30–41 Cerca con Google

Cooper, W. S. (1973). On Selecting a Measure of Retrieval Effectiveness. Journal of the American Society for Information Science (JASIS), 24(2):87–100. Cerca con Google

Cormack, G. and Lynam, T. (2005). TREC 2005 Spam Track Overview. In The Fourteenth Text REtrieval Conference Proceedings, TREC ’05. National Institute of Standards and Technology (NIST), Special Publication 500-266, Washington, USA. Cerca con Google

Daniłowicz, C. and Balinski, J. (2001). Document Ranking Based upon Markov Chains. Information Processing & Management, 37(4):623–637. Cerca con Google

Dato, D., Lucchese, C., Nardini, F. M., Orlando, S., Perego, R., Tonellotto, N., and Venturini, R. (2016). Fast Ranking with Additive Ensembles of Oblivious and Non-Oblivious Regression Trees. ACM Transaction on Information Systems (TOIS), 35(2):15:1–15:31. Cerca con Google

Dawid, A. P. and Skene, A. M. (1979). Maximum Likelihood Estimation of Observer Error- Rates Using the EM Algorithm. Journal of the Royal Statistical Society. Series C (Applied Statistics), 28(1):20–28. Cerca con Google

Dennis, B. K., Brady, J. J., and Dovel, J. A. (1962). Index Manipulation and Abstract Retrieval by Computer. Journal of Chemical Documentation, 2(4):234–242. Cited on page 22. Cerca con Google

Donmez, P., Svore, K. M., and Burges, C. J. (2009). On the Local Optimality of LambdaRank. In Proceedings of the 32Nd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’09, pages 460–467, New York, NY, USA. ACM. Cerca con Google

Egghe, L. (2008). The Measures Precision, Recall, Fallout and Miss as a Function of the Number of Retrieved Documents and their Mutual Interrelations. Information Processing & Management, 44(2):856–876. Cerca con Google

Eickhoff, C., Harris, C. G., de Vries, A. P., and Srinivasan, P. (2012). Quality Through Flow and Immersion: Gamifying Crowdsourced Relevance Assessments. In Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’12, pages 871–880, New York, NY, USA. ACM. Cerca con Google

Fawcett, T. (2006). An Introduction to ROC Analysis. Pattern Recognition Letters, 27(8):861– 874. Cerca con Google

Fenton, N. and Bieman, J. M. (2014). Software Metrics: A Rigorous & Practical Approach. Chapman and Hall/CRC, USA. Cerca con Google

Ferrante, M., Ferro, N., and Maistro, M. (2014a). Injecting User Models and Time into Precision via Markov Chains. In Proceedings of the 37th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’14, pages 597–606, New York, NY, USA. ACM. Cerca con Google

Ferrante, M., Ferro, N., and Maistro, M. (2014b). Rethinking How to Extend Average Precision to Graded Relevance. In Information Access Evaluation – Multilinguality, Multimodality, and Interaction. Proceedings of the Fifth International Conference of the CLEF Initiative, CLEF ’14, pages 19–30. Lecture Notes in Computer Science (LNCS) 8685, Springer, Heidelberg, Germany. Cerca con Google

Ferrante, M., Ferro, N., and Maistro, M. (2015). Towards a Formal Framework for Utility- oriented Measurements of Retrieval Effectiveness. In Proceedings of the 2015 International Conference on The Theory of Information Retrieval, ICTIR ’15, pages 21–30, New York, NY, USA. ACM. Cerca con Google

Ferrante, M., Ferro, N., and Maistro, M. (2017). AWARE: Exploiting Evaluation Measures to Combine Multiple Assessors. ACM Transaction on Information Systems, 36(2):20:1–20:38. Cerca con Google

Ferro, N. (2017). Reproducibility Challenges in Information Retrieval Evaluation. ACM Journal of Data and Information Quality (JDIQ), 8(2):8:1–8:4. Cerca con Google

Ferro, N., Fuhr, N., Järvelin, K., Kando, N., Lippold, M., and Zobel, J. (2016a). Increasing Reproducibility in IR: Findings from the Dagstuhl Seminar on “Reproducibility of Data- Oriented Experiments in e-Science”. SIGIR Forum, 50(1):68–82. Cerca con Google

Ferro, N., Lucchese, C., Maistro, M., and Perego, R. (2017). On Including the User Dynamic in Learning to Rank. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’17, pages 1041–1044, New York, NY, USA. ACM Cerca con Google

Ferro, N., Silvello, G., Keskustalo, H., Pirkola, A., and Järvelin, K. (2016b). The Twist Measure for IR Evaluation: Taking User’s Effort into Account. Journal of the Association for Information Science and Technology, 67(3):620–648. Cerca con Google

Finkelstein, L. (2003). Widely, Strongly and Weakly De ned Measurement. Measurement, 34(1):39–48. Cerca con Google

Folland, G. B. (1999). Real Analysis: Modern Techniques and Their Applications. John Wiley & Sons, New York, USA, 2nd edition. Cerca con Google

Friedman, Jerome H (2001). Greedy Function Approximation: a Gradient Boosting Machine. Annals of statistics, pages 1189–1232. Cerca con Google

Fuhr, N. (1989). Optimum Polynomial Retrieval Functions Based on the Probability Ranking Principle. ACM Transaction on Information Systems (TOIS), 7(3):183–204. Cerca con Google

Fuhr, N. (2010). IR between Science and Engineering, and the Role of Experimentation, pages 1–1. Springer Berlin Heidelberg, Berlin, Heidelberg. Cerca con Google

Goker, A. and Davies, J. (2009). Information Retrieval: Searching in the 21st Century. John Wiley & Sons. Cerca con Google

Golub, G. H. and Van Loan, C. F. (2012). Matrix Computations. Johns Hopkins University Press, USA, 4th edition. Cerca con Google

Grady, C. and Lease, M. (2010). Crowdsourcing Document Relevance Assessment with Mechanical Turk. In Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon’s Mechanical Turk, pages 172–179. The Association for Computational Linguistics (ACL), USA. Cerca con Google

Halvey, M., Villa, R., and Clough, P. (2014). SIGIR 2014 Workshop on Gathering Ef cient Assessments of Relevance (GEAR). In Proceedings of the 37th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’14, pages 1293–1293, New York, NY, USA. ACM. Cerca con Google

Harman, D. (1992a). The DARPA TIPSTER Project. SIGIR Forum, 26(2):26–28. Cerca con Google

Harman, D. K. (1992b). Overview of the First TREC Conference. In Proceedings of the First Text REtrieval Conference, TREC 1, pages 1–30, Gaithersburg, MD, USA: Department of Commerce, National Institute of Standards and Technology. National Institute of Standards and Technology (NIST), Special Publication 500-225, Washington, USA. Cerca con Google

Harman, D. K. (1993). Overview of the Second TREC Conference (TREC-2). In Proceedings of the Second Text REtrieval Conference, TREC 2, Gaithersburg, MD, USA: Department of Commerce, National Institute of Standards and Technology. National Institute of Standards and Technology (NIST), Special Publication 500-225, Washington, USA. Cerca con Google

Harman, D. K. (1994). Overview of the Third Text REtrieval Conference (TREC-3). In Proceedings of the Third Text REtrieval Conference, TREC 3, pages 1–19. National Institute of Standards and Technology (NIST), Special Publication 500-225, Washington, USA. Cerca con Google

Harman, D. K. (1995). Overview of the Second Text Retrieval Conference (TREC-2). Information Processing & Management, 31(3):271–289. Cerca con Google

Harman, D. K. (2011). Information Retrieval Evaluation. Morgan & Claypool Publishers, 1st edition. Cerca con Google

Harris, C. and Srinivasan, P. (2013). Using Hybrid Methods for Relevance Assessment in TREC Crowd’12. In The Twenty-First Text REtrieval Conference Proceedings, TREC ’12. National Institute of Standards and Technology (NIST), Special Publication 500-298, Washington, USA. Cerca con Google

He, Y. and Wang, K. (2011). Inferring Search Behaviors Using Partially Observable Markov Model with Duration (POMD). In Proceedings of the Fourth ACM International Conference on Web Search and Data Mining, WSDM ’11, pages 415–424, New York, NY, USA. ACM. Cerca con Google

Hiemstra, D. and Kraaij, W. (1998). Twenty-One at TREC-7: Ad-hoc and Cross-language Track. pages 227–238. Cerca con Google

Hochberg, Y. and Tamhane, A. C. (1987). Multiple Comparison Procedures. John Wiley & Sons, USA. Cerca con Google

Hofmann, K., Schuth, A., Whiteson, S., and de Rijke, M. (2013a). Reusing Historical Interaction Data for Faster Online Learning to Rank for IR. In Proceedings of the Sixth ACM International Conference on Web Search and Data Mining, WSDM ’13, pages 183–192, New York, NY, USA. ACM. Cerca con Google

Hofmann, K., Whiteson, S., and de Rijke, M. (2013b). Balancing Exploration and Exploitation in Listwise and Pairwise Online Learning to Rank for Information Retrieval. Information Retrieval, 16(1):63–90. Cerca con Google

Hofmann, K., Whiteson, S., Schuth, A., and de Rijke, M. (2014). Learning to Rank for Information Retrieval from User Interactions. SIGWEB Newsletter, (Spring):1–7. Cerca con Google

Hosseini, M., Cox, I. J., Milic-Frayling, N., Kazai, G., and Vinay, V. (2012). On Aggregating Labels from Multiple Crowd Workers to Infer Relevance of Documents. In Advances in Information Retrieval. Proceedings of the 34th European Conference on IR Research, ECIR’12, pages 182–194, Berlin, Heidelberg. Springer-Verlag. Cerca con Google

Ipeirotis, P. G. and Gabrilovich, E. (2014). Quizz: Targeted Crowdsourcing with a Billion (Potential) Users. In Proceedings of the 23rd International Conference on World Wide Web, WWW ’14, pages 143–154, New York, NY, USA. ACM. Cerca con Google

Järvelin, K. and Kekäläinen, J. (2002). Cumulated Gain-based Evaluation of IR Techniques. ACM Transaction on Information Systems (TOIS), 20(4):422–446. Cerca con Google

Joachims, T. (2002). Optimizing Search Engines Using Clickthrough Data. In Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’02, pages 133–142, New York, NY, USA. ACM. Cerca con Google

Joachims, T., Granka, L., Pan, B., Hembrooke, H., and Gay, G. (2005). Accurately In- terpreting Clickthrough Data As Implicit Feedback. In Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Re- trieval, SIGIR ’05, pages 154–161, New York, NY, USA. ACM. Cerca con Google

Joachims, T., Swaminathan, A., and Schnabel, T. (2017). Unbiased Learning-to-Rank with Biased Feedback. In Proceedings of the Tenth ACM International Conference on Web Search and Data Mining, WSDM ’17, pages 781–789, New York, NY, USA. ACM. Cerca con Google

Josephy, T., Lease, M., Paritosh, P., Krause, M., Georgescu, M., Tjalve, M., and Braga, D. (2014). Workshops Held at the First AAAI Conference on Human Computation and Crowdsourcing: A Report. AI Magazine, 35(2):75–78. Cerca con Google

Jung, H. J. and Lease, M. (2015). A Discriminative Approach to Predicting Assessor Accuracy. In Advances in Information Retrieval. Proceedings of the 37th European Conference on IR Research, ECIR ’15. Lecture Notes in Computer Science (LNCS) 9022, Springer, Heidelberg, Germany. Cerca con Google

Kazai, G. (2011). In Search of Quality in Crowdsourcing for Search Engine Evaluation. In Advances in Information Retrieval. Proceedings of the 33rd European Conference on IR Research, ECIR ’11, pages 165–176. Lecture Notes in Computer Science (LNCS) 6611, Springer, Heidelberg, Germany. Cerca con Google

Kazai, G., Craswell, N., Yilmaz, E., and Tahaghoghi, S. (2012a). An Analysis of Systematic Judging Errors in Information Retrieval. In Proceedings of the 21st ACM International Conference on Information and Knowledge Management, CIKM ’12, pages 105–114, New York, NY, USA. ACM. Cerca con Google

Kazai, G., Kamps, J., Koolen, M., and Milic-Frayling, N. (2011). Crowdsourcing for Book Search Evaluation: Impact of Hit Design on Comparative System Ranking. In Proceed- ings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’11, pages 205–214, New York, NY, USA. ACM. Cerca con Google

Kazai, G., Kamps, J., and Milic-Frayling, N. (2012b). The Face of Quality in Crowdsourcing Relevance Labels: Demographics, Personality and Labeling Accuracy. In Proceedings of the 21st ACM International Conference on Information and Knowledge Management, CIKM ’12, pages 2583–2586, New York, NY, USA. ACM. Cerca con Google

Kazai, G., Kamps, J., and Milic-Frayling, N. (2013a). An Analysis of Human Factors and Label Accuracy in Crowdsourcing Relevance Judgments. Information Retrieval, 16(2):138–178. Cerca con Google

Kazai, G., Yilmaz, E., Craswell, N., and Tahaghoghi, S. (2013b). User Intent and Assessor Disagreement in Web Search Evaluation. In Proceedings of the 22Nd ACM International Conference on Information & Knowledge Management, CIKM ’13, pages 699–708, New York, NY, USA. ACM. Cerca con Google

Kekäläinen, J. and Järvelin, K. (2002). Using Graded Relevance Assessments in IR Evalua- tion. Journal of the American Society for Information Science and Technology (JASIST), 53(13):1120—1129. Cerca con Google

Kendall, M. G. (1945). The Treatment of Ties in Ranking Problems. Biometrika, 33(3):239– 251. Cerca con Google

Kendall, M. G. (1948). Rank Correlation Methods. Griffin, Oxford, England. Cerca con Google

Kenney, J. F. and Keeping, E. S. (1954). Mathematics of Statistics – Part One. D. Van Nostrand Company, Princeton, USA, 3rd edition. Cerca con Google

Kent, A., Berry, M. M., Luehrs, F. U., and Perry, J. W. (1955). Machine Literature Searching VIII. Operational Criteria for Designing Information Retrieval Systems. American Documentation, 6(2):93–101. Cerca con Google

Kiewitt, E. L. (1979). Evaluating Information Retrieval Systems: The Probe Program. Greenwood Publishing Group Inc., Westport, CT, USA. Cerca con Google

King, I., Chen, K.-T., Alonso, O., and Larson, M. (2016). Special Issue: Crowd in Intelligent Systems. ACM Transactions on Intelligent Systems and Technology (TIST), 7(4). Cerca con Google

Kinney, K. A., Huffman, S. B., and Zhai, J. (2008). How Evaluator Domain Expertise Affects Search Result Relevance Judgments. In Proceedings of the 17th ACM Conference on Information and Knowledge Management, CIKM ’08, pages 591–598, New York, NY, USA. ACM. Cerca con Google

Knuth, D. E. (1981). The Art of Computer Programming – Volume 2: Seminumerical Algorithms. Addison-Wesley, USA, 2nd edition. Cerca con Google

Koopman, B. and Zuccon, G. (2014). Relevation!: An Open Source System for Information Retrieval Relevance Assessment. In Proceedings of the 37th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’14, pages 1243–1244, New York, NY, USA. ACM. Cerca con Google

Krantz, D. H., Luce, R. D., Suppes, P., and Tversky, A. (1971). Foundations of Measurement. Additive and Polynomial Representations, volume 1. Academic Press, New York, USA. Cerca con Google

Kullback, S. and Leibler, R. A. (1951). On Information and Sufficiency. The Annals of Mathematical Statistics, 22(1):79–86. Cerca con Google

Lafferty, J. and Zhai, C. (2001). Document Language Models, Query Models, and Risk Minimization for Information Retrieval. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’01, pages 111–119. ACM. Cerca con Google

Law, E., Bennett, P. N., and Horvitz, E. (2011). The Effects of Choice in Routing Relevance Judgments. In Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’11, pages 1127–1128, New York, NY, USA. ACM. Cerca con Google

Lawrence, S. and Giles, C. L. (1998). Searching the World Wide Web. Science, 280(5360):98– 100. Cerca con Google

Lease, M. and Yilmaz, E. (2013). Crowdsourcing for Information Retrieval: Introduction to the Special Issue. Information Retrieval, 16(2):91–100. Cerca con Google

Lesk, M., Harman, D. K., Fox, E. A., Wu, H., and Buckley, C. (1997). The SMART Lab Report. SIGIR Forum, 31(1):2–22. Cerca con Google

Lesk, M. E. and Salton, G. (1968). Relevance Assessments and Retrieval System Evaluation. Information Storage and Retrieval, 4(4):343–359. Cerca con Google

Li, L. and Smucker, M. D. (2014). Tolerance of Effectiveness Measures to Relevance Judging Errors. In Advances in Information Retrieval. Proc. 36th European Conference on IR Research, ECIR ’14, pages 148–159. Lecture Notes in Computer Science (LNCS) 8416, Springer, Heidelberg, Germany. Cerca con Google

Lipani, A., Lupu, M., and Hanbury, A. (2015). Splitting Water: Precision and Anti-Precision to Reduce Pool Bias. In Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’15, pages 103–112, New York, NY, USA. ACM. Cerca con Google

Liu, T.-Y. (2011). Learning to Rank for Information Retrieval. Springer-Verlag Berlin Heidelberg. Cerca con Google

Loni, B., Larson, M., Bozzon, A., and Gottlieb, L. (2013). Crowdsourcing for Social Multimedia at MediaEval 2013: Challenges, Data set, and Evaluation. In Working Notes Proceedings of the MediaEval 2013 Multimedia Benchmark Workshop. CEUR Workshop Proceedings (CEUR-WS.org). Cerca con Google

Lucchese, C., Orlando, S., Perego, R., Silvestri, F., and Tolomei, G. (2013). Discovering Tasks from Search Engine Query Logs. ACM Transaction on Information Systems (TOIS), 31(3):14. Cerca con Google

Luhn, Hans Peter (1957). A Statistical Approach to Mechanized Encoding and Searching of Literary Information. IBM Journal of Research and Development, 1(4):309–317. Cerca con Google

Maddalena, E. and Mizzaro, S. (2014). Axiometrics: Axioms of Information Retrieval Effectiveness Metrics. In Proceedings of the 6th International Workshop on Evaluating Information Access, EVIA ’14, pages 17–24. National Institute of Informatics, Tokyo, Japan. Cerca con Google

Maddalena, E., Mizzaro, S., Scholer, F., and Turpin, A. (2015). Judging Relevance Using Magnitude Estimation. In Advances in Information Retrieval. Proceedings of the 37th Eu- ropean Conference on IR Research, ECIR ’15, pages 215–220. Lecture Notes in Computer Science (LNCS) 9022, Springer, Heidelberg, Germany. Cerca con Google

Maddalena, E., Mizzaro, S., Scholer, F., and Turpin, A. (2017). On Crowdsourcing Relevance Magnitudes for Information Retrieval Evaluation. ACM Transaction on Information Systems (TOIS), 35(3):19:1–19:32. Cerca con Google

Manmatha, R., Rath, T., and Feng, F. (2001). Modeling Score Distributions for Combining the Outputs of Search Engines. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’01, pages 267–275, New York, NY, USA. ACM. Cerca con Google

Manning, C. D., Raghavan, P., and Schütze, H. (2008). Introduction to Information Retrieval. Cambridge University Press, New York, NY, USA. Cerca con Google

Marcus, A. and Parameswaran, A. (2015). Crowdsourced Data Management: Industry and Academic Perspectives. Foundations and Trends in Databases (FnTDB), 6(1–2):1–161. Cerca con Google

Mari, L. (2000). Beyond the Representational Viewpoint: a New Formalization of Measurement. Measurement, 27(2):71–84. Cerca con Google

Maron, M. E. and Kuhns, J. L. (1960). On Relevance, Probabilistic Indexing and Information Retrieval. Journal of the ACM, 7(3):216–244. Cerca con Google

Maxwell, K. T. and Croft, W. B. (2013). Compact Query Term Selection Using Topically Related Text. In Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’13, pages 583–592, New York, NY, USA. ACM. Cerca con Google

Maxwell, S. and Delaney, H. D. (2004). Designing Experiments and Analyzing Data. A Model Comparison Perspective. Lawrence Erlbaum Associates, Mahwah (NJ), USA, 2nd edition. Cerca con Google

Miller, D. R. H., Leek, T., and Schwartz, R. M. (1999). A Hidden Markov Model Information Retrieval System. In Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’99, pages 214–221, New York, NY, USA. ACM. Cerca con Google

Miyamoto, S. (2004). Generalizations of Multisets and Rough Approximations. International Journal of Intelligent Systems, 19(7):639–652. Cerca con Google

Mizzaro, S. (1997). Relevance: The Whole History. Journal of the Association for Information Science and Technology, 48(9):810–832. Cerca con Google

Moffat, A. (2013). Seven Numeric Properties of Effectiveness Metrics. In Proceedings of the 9th Asia Information Retrieval Societies Conference, volume 8281 of AIRS ’13, pages 1–12. Lecture Notes in Computer Science (LNCS), Springer, Heidelberg, Germany. Cerca con Google

Moffat, A., Thomas, P., and Scholer, F. (2013). Users Versus Models: What Observation Tells Us About Effectiveness Metrics. In Proceedings of the 22nd ACM International Conference on Information & Knowledge Management, CIKM ’13, pages 659–668, New York, NY, USA. ACM. Cerca con Google

Moffat, A. and Zobel, J. (2008). Rank-biased Precision for Measurement of Retrieval Effectiveness. ACM Transactions on Information Systems (TOIS), 27(1):2:1–2:27. Cerca con Google

Mooers, C. N. (1950). Information Retrieval Viewed as Temporal Signaling. In Proceedings of the International Congress of Mathematicians, pages 572–573, Providence, R.I. American Mathematical Society. Cerca con Google

Mooers, C. N. (1951). Scientific Information Retrieval Systems for Machine Operation; Case Studies in Design. Zator Technical Bulletin, 66:18 L. Cerca con Google

Moshfeghi, Y., Huertas Rosero, H. F., and Jose, J. M. (2016). A Game-Theory Approach for Effective Crowdsource-Based Relevance Assessment. ACM Transactions on Intelligent Systems and Technology (TIST), 7(4):55:1–55:XXX. Cerca con Google

Murphy, K. R., Myors, B., and Wolach, A. (2014). Statistical Power Analysis: A Simple and General Model for Traditional and Modern Hypothesis Tests. Routledge, Taylor & Francis Group, UK, 4th edition. Cerca con Google

Norris, J. R. (1998). Markov chains. Cambridge University Press, UK. Cerca con Google

Olejnik, S. and Algina, J. (2003). Generalized Eta and Omega Squared Statistics: Measures of Effect Size for Some Common Research Designs. Psychological Methods, 8(4):434–447. Cerca con Google

Pillai, I., Fumera, I., and Roli, F. (2013). Multi-label Classification with a Reject Option. Pattern Recognition, 46(8):2256–2266. Cerca con Google

Ponte, J. M. and Croft, W. B. (1998). A Language Modeling Approach to Information Retrieval. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’98, pages 275–281, New York, NY, USA. ACM. Cerca con Google

Qin, T. and Liu, T. (2013). Introducing LETOR 4.0 Datasets. Computing Research Repository (CoRR), abs/1306.2597. Cerca con Google

Qiu, F. and Cho, J. (2006). Automatic Identi cation of User Interest for Personalized Search. In Proceedings of the 15th International Conference on World Wide Web, WWW ’06, pages 727–736, New York, NY, USA. ACM. Cerca con Google

Radlinski, F., Kurup, M., and Joachims, T. (2008). How Does Clickthrough Data Reflect Retrieval Quality? In Proceedings of the 17th ACM Conference on Information and Knowledge Management, CIKM ’08, pages 43–52, New York, NY, USA. ACM. Cerca con Google

Raykar, V. C. and Yu, S. (2012). Eliminating Spammers and Ranking Annotators for Crowdsourced Labeling Tasks. Journal of Machine Learning Research, 13:491–518. Cerca con Google

Raykar, V. C., Yu, S., Zhao, L. H., Jerebko, A., Florin, C., Hermosillo Valadez, G., Bogoni, L., and Moy, L. (2009). Supervised Learning from Multiple Experts: Whom to Trust when Everyone Lies a Bit. In Proceedings of the 26th Annual International Conference on Machine Learning, ICML ’09, pages 889–896. ACM Press, New York, USA. Cerca con Google

Raykar, V. C., Zhao, L. H., Hermosillo Valadez, G., Florin, C., Bogoni, L., and Moy, L. (2010). Learning From Crowds. Journal of Machine Learning Research, 11:1297–1322. Cerca con Google

Rendle, S., Freudenthaler, C., Gantner, Z., and Schmidt-Thieme, L. (2009). BPR: Bayesian Personalized Ranking from Implicit Feedback. In Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence, UAI ’09, pages 452–461, Arlington, Virginia, United States. AUAI Press. Cerca con Google

Robertson, S. (2008a). A New Interpretation of Average Precision. In Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’08, pages 689–690, New York, NY, USA. ACM. Cerca con Google

Robertson, S. (2008b). On the History of Evaluation in IR. Journal of Information Science, 34(4):439–456. Cerca con Google

Robertson, S. E. (1977). The Probability Ranking Principle in IR. Journal of Documentation, 33(4):294–304. Cerca con Google

Robertson, S. E. and Hancock-Beaulieu, M. M. (1992). On the Evaluation of IR Systems. Information Processing & Management, 28(4):457–466. Cerca con Google

Robertson, S. E., Kanoulas, E., and Yilmaz, E. (2010). Extending Average Precision to Graded Relevance Judgments. In Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’10, pages 603–610, New York, NY, USA. ACM. Cerca con Google

Robertson, S. E. and Spärck Jones, K. (1976). Relevance Weighting of Search Terms. Journal of the Association for Information Science and Technology, 27(3):129–146. Cerca con Google

Rocchio, J. (1971). Relevance Feedback in Information Retrieval. The Smart Retrieval System: Experiments in Automatic Document Processing, pages 313–323. Cerca con Google

Rorvig, M. E. (1990). The Simple Scalability of Documents. Journal of the American Society for Information Science, 41(8):590–598. Cerca con Google

Rutherford, A. (2011). ANOVA and ANCOVA. A GLM Approach. John Wiley & Sons, New York, USA, 2nd edition. Cerca con Google

Ruthven, I. (2014). Relevance Behaviour in TREC. Journal of Documentation, 70(6):1098– 1117. Cerca con Google

Sakai, T. (2005). Ranking the NTCIR Systems Based on Multigrade Relevance. In In- formation Retrieval Technology – Asia Information Retrieval Symposium (AIRS 2004), pages 251–262. Lecture Notes in Computer Science (LNCS) 3411, Springer, Heidelberg, Germany. Cerca con Google

Sakai, T. (2006). Evaluating Evaluation Metrics Based on the Bootstrap. In Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’06, pages 525–532, New York, NY, USA. ACM. Cerca con Google

Sakai, T. (2007). Alternatives to Bpref. In Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’07, pages 71–78, New York, NY, USA. ACM. Cerca con Google

Sakai, T. (2014a). Metrics, Statistics, Tests, pages 116–163. Springer Berlin Heidelberg, Berlin, Heidelberg. Cerca con Google

Sakai, T. (2014b). Statistical Reform in Information Retrieval? SIGIR Forum, 48(1):3–12. Cerca con Google

Sakai, T. and Dou, Z. (2013). Summaries, Ranked Retrieval and Sessions: A Unified Framework for Information Access Evaluation. In Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’13, pages 473–482, New York, NY, USA. ACM. Cerca con Google

Salton, G. (1968). Automatic Information Organization and Retrieval. McGraw Hill Text. Cerca con Google

Salton, G. (1971). The SMART Retrieval System—Experiments in Automatic Document Processing. Prentice-Hall, Inc., Upper Saddle River, NJ, USA. Cerca con Google

Salton, G. and McGill, M. J. (1986). Introduction to Modern Information Retrieval. McGraw-Hill, Inc., New York, NY, USA. Cerca con Google

Salton, G. and Yang, C.-S. (1973). On the Specification of Term Values in Automatic Indexing. Journal of documentation, 29(4):351–372. Cerca con Google

Salton, G. and Yu, C. T. (1973). On the Construction of Effective Vocabularies for Information Retrieval. In Proceedings of the 1973 Meeting on Programming Languages and Information Retrieval, SIGPLAN ’73, pages 48–60, New York, NY, USA. ACM. Cerca con Google

Sanderson, M. (2010). Test Collection Based Evaluation of Information Retrieval Systems. Foundations and Trends® in Information Retrieval, 4(4):247–375. Cerca con Google

Sanderson, M. and Zobel, J. (2005). Information Retrieval System Evaluation: Effort, Sensitivity, and Reliability. In Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’05, pages 162–169, New York, NY, USA. ACM. Cerca con Google

Saracevic, T. (1975). Relevance: A Review of and a Framework for the Thinking on the Notion in Information Science. Journal of the Association for Information Science and Technology, 26(6):321–343. Cerca con Google

Schuth, A., Hofmann, K., Whiteson, S., and de Rijke, M. (2013). Lerot: An Online Learning to Rank Framework. In Proceedings of the 2013 Workshop on Living Labs for Information Retrieval Evaluation, LivingLab ’13, pages 23–26, New York, NY, USA. ACM. Cerca con Google

Serdyukov, P., Craswell, N., and Dupret, G. (2012). WSCD 2012: Workshop on Web Search Click Data 2012. In Proceedings of the Fifth ACM International Conference on Web Search and Data Mining, WSDM ’12, pages 771–772, New York, NY, USA. ACM. Cerca con Google

Silvestri, F. (2009). Mining Query Logs: Turning Search Usage Data into Knowledge. Foundations and Trends® in Information Retrieval, 4(1–2):1–174. Cerca con Google

Smucker, M. D. and Clarke, C. L. (2012a). Time-based Calibration of Effectiveness Measures. In Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’12, pages 95–104, New York, NY, USA. ACM. Cerca con Google

Smucker, M. D. and Clarke, C. L. A. (2012b). Stochastic Simulation of Time-biased Gain. In Proceedings of the 21st ACM International Conference on Information and Knowledge Management, CIKM ’12, pages 2040–2044, New York, NY, USA. ACM. Cerca con Google

Smucker, M. D. and Jethani, C. P. (2011a). Measuring Assessor Accuracy: A Comparison of Nist Assessors and User Study Participants. In Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR’11, pages 1231–1232, New York, NY, USA. ACM. Cerca con Google

Smucker, M. D. and Jethani, C. P. (2011b). The Crowd vs. the Lab: A Comparison of Crowd-Sourced and University Laboratory Participant Behavior. In Proceedings of the SIGIR 2011 Workshop on Crowdsourcing for Information Retrieval. Cerca con Google

Smucker, M. D., Kazai, G., and Lease, M. (2013). Overview of the TREC 2012 Crowd- sourcing Track. In The Twenty-First Text REtrieval Conference Proceedings, TREC ’12. National Institute of Standards and Technology (NIST), Special Publication 500-298, Washington, USA. Cerca con Google

Smucker, M. D., Kazai, G., and Lease, M. (2014). Overview of the TREC 2013 Crowd- sourcing Track. In The Twenty-Second Text REtrieval Conference Proceedings, TREC ’13. National Institute of Standards and Technology (NIST), Special Publication 500-302, Washington, USA. Cerca con Google

Soboroff, I. (2006). Dynamic Test Collections: Measuring Search Effectiveness on the Live Web. In Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’06, pages 276–283, New York, NY, USA. ACM. Cerca con Google

Soboroff, I., Nicholas, C., and Cahan, P. (2001). Ranking Retrieval Systems Without Relevance Judgments. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’01, pages 66–73, New York, NY, USA. ACM. Cerca con Google

Soboroff, I. and Robertson, S. (2003). Building a Filtering Test Collection for TREC 2002. In Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Informaion Retrieval, SIGIR ’03, pages 243–250, New York, NY, USA. ACM. Cerca con Google

Spärck Jones, K. (1973). Collection Properties Influencing Automatic Term Classification performance. Information Storage and Retrieval, 9(9):499 – 513. Cerca con Google

Spärck Jones, K. (1997). Readings in Information Retrieval. Morgan Kaufmann. Cerca con Google

Spärck Jones, K. and Van Rijsbergen, C. J. (1975). Report on the Need for and Provision of an Ideal Information Retrieval Test Collection. British Library Research and Development reports. Computer Laboratory, University of Cambridge. Cerca con Google

Spärck Jones, K., Walker, S., and Robertson, S. E. (2000). A Probabilistic Model of Information Retrieval: Development and Comparative Experiments. Information Processing and Management, 36(6):779–808. Cerca con Google

Speicher, M., Both, A., and Gaedke, M. (2013). TellMyRelevance!: Predicting the Relevance of Web Search Results from Cursor Interactions. In Proceedings of the 22Nd ACM International Conference on Information & Knowledge Management, CIKM ’13, pages 1281–1290, New York, NY, USA. ACM. Cerca con Google

Speretta, M. and Gauch, S. (2005). Personalized Search Based on User Search Histories. In Proceedings of the 2005 IEEE/WIC/ACM International Conference on Web Intelligence, WI ’05, pages 622–628. Cerca con Google

Stevens, S. S. (1946). On the Theory of Scales of Measurement. Science, New Series, 103(2684):677–680. Cerca con Google

Stoker, B. (1897). Dracula, volume 135. New York: Oxford University Press, 1990. Cerca con Google

Sutton, R. S. and Barto, A. G. (1998). Introduction to Reinforcement Learning, volume 135. MIT Press Cambridge. Cerca con Google

Swets, J. A. (1963). Information Retrieval Systems. Science, 141(3577):245–250. Cerca con Google

Taylor, R. S. (1962). The Process of Asking Questions. American Documentation, 13(4):391– 396. Cerca con Google

Teodorescu, I. (2009). Maximum Likelihood Estimation for Markov Chains. arXiv preprint arXiv:0905.4131. Cerca con Google

Turpin, A., Scholer, F., Mizzaro, S., and Maddalena, E. (2015). The Bene ts of Magnitude Estimation Relevance Assessments for Information Retrieval Evaluation. In Proceedings of the 38th International ACM SIGIR Conference on Research and Development in In- formation Retrieval, SIGIR ’15, pages 565–574, New York, NY, USA. ACM. Cerca con Google

van den Bosch, A., Bogers, T., and de Kunder, M. (2016). Estimating Search Engine Index Size Variability: a 9-Year Longitudinal Study. Scientometrics, 107:839–856. Cerca con Google

van Rijsbergen, C. J. (1974). Foundation of Evaluation. Journal of Documentation, 30(4):365– 373. Cerca con Google

van Rijsbergen, C. J. (1979). Information Retrieval. Butterworth-Heinemann, Newton, MA, USA, 2nd edition. Cerca con Google

van Rijsbergen, C. J. (1981). Retrieval effectiveness. In Spärck Jones, K., editor, Information Retrieval Experiment, pages 32–43. Butterworths, London, United Kingdom. Cerca con Google

Vaswani, P. K. T. and Cameron, J. B. (1970). The National Physical Laboratory Experiments in Statistical Word Associations and Their Use in Document Indexing And Retrieval. National Physical Laboratory Computer Science Division-Publications, National Physical Laboratory, Teddington, UK. Cerca con Google

Voorhees, E. M. (1998). Variations in Relevance Judgments and the Measurement of Retrieval Effectiveness. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’98, pages 315–323. ACM Press, New York, USA. Cerca con Google

Voorhees, E. M. (2000). Variations in Relevance Judgments and the Measurement of Retrieval Effectiveness. Information processing & management, 36(5):697–716. Cerca con Google

Voorhees, E. M. (2001). Evaluation by Highly Relevant Documents. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR’01, pages 74–82, New York, NY, USA. ACM. Cerca con Google

Voorhees, E. M. (2008). On Test Collections for Adaptive Information Retrieval. Information Processing and Management, 44(6):1879–1885. Cerca con Google

Voorhees, E. M. (2015). Overview of the TREC 2004 Robust Track. In The Twenty-Third Text REtrieval Conference Proceedings, TREC ’14. National Institute of Standards and Technology (NIST), Special Publication 500-308, Washington, USA. Cerca con Google

Voorhees, E. M. and Harman, D. K. (1999). Overview of the eighth text retrieval conference (TREC-8). In The Eighth Text Retrieval Conference, TREC-8, pages 1–24, Gaithersburg, MD, USA: Department of Commerce, National Institute of Standards and Technology. NIST Special Publication. Cerca con Google

Vuurens, J. B. P. and de Vries, A. P. (2012). Obtaining High-Quality Relevance Judgments Using Crowdsourcing. IEEE Internet Computing, 16(5):20–27. Cerca con Google

Wakeling, S., Halvey, M., Villa, R., and Hasler, L. (2016). A Comparison of Primary and Secondary Relevance Judgements for Real-Life Topics. In Proceedings of the 2016 ACM on Conference on Human Information Interaction and Retrieval, CHIIR ’16, pages 173–182, New York, NY, USA. ACM. Cerca con Google

Wand, M. P. and Jones, M. C. (1995). Kernel Smoothing. Chapman and Hall/CRC, USA. Cerca con Google

Wang, C., Liu, Y., Wang, M., Zhou, K., Nie, J.-y., and Ma, S. (2015). Incorporating Non-sequential Behavior into Click Models. In Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’15, pages 283–292, New York, NY, USA. ACM. Cerca con Google

Wang, K., Gloy, N., and Li, X. (2010). Inferring Search Behaviors Using Partially Observable Markov (POM) Model. In Proceedings of the Third ACM International Conference on Web Search and Data Mining, WSDM ’10, pages 211–220, New York, NY, USA. ACM. Cerca con Google

Wang, Z. Y. and Klir, G. J. (1992). Fuzzy Measure Theory. Springer-Verlag, New York, USA. Cerca con Google

Webber, W., Chandar, P., and Carterette, B. (2012). Alternative Assessor Disagreement and Retrieval Depth. In Proceedings of the 21st ACM International Conference on Information and Knowledge Management, CIKM ’12, pages 125–134, New York, NY, USA. ACM. Cerca con Google

Webber, W., Moffat, A., and Zobel, J. (2010). A Similarity Measure for Indefinite Rankings. ACM Transactions on Information Systems (TOIS), 4(28):20:1–20:38. Cerca con Google

Webber, W. and Pickens, J. (2013). Assessor Disagreement and Text Classifier Accuracy. In Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’13, pages 929–932, New York, NY, USA. ACM. Cerca con Google

Wei, X. and Croft, W. B. (2006). LDA-based Document Models for Ad-hoc Retrieval. In Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’06, pages 178–185, New York, NY, USA. ACM. Cerca con Google

Wilson, E. B. (1952). An Introduction to Scientific Research. McGraw-Hill. Cerca con Google

Wu, C. F. J. (1983). On the Convergence Properties of the EM Algorithm. The Annals of Statistics, 11(1):95–103. Cerca con Google

Wu, Q., Burges, C. J. C., Svore, K. M., and Gao, J. (2010). Adapting Boosting for Information Retrieval Measures. Information Retrieval, 13(3):254–270. Cerca con Google

Yadati, K., Shakthinathan, P. S. N., Ayyanathan, C., and Larson, M. (2014). Crowdsorting Timed Comments about Music: Foundations for a New Crowdsourcing Task. In Working Notes Proceedings of the MediaEval 2014 Multimedia Benchmark Workshop. CEUR Workshop Proceedings (CEUR-WS.org). Cerca con Google

Yang, G. H., Sloan, M., and Wang, J. (2016). Dynamic Information Retrieval Modeling. Synthesis Lectures on Information Concepts, Retrieval, and Services, 8(3):1–144. Cerca con Google

Yilmaz, E. and Aslam, J. A. (2006). Estimating Average Precision with Incomplete and Imperfect Judgments. In Proceedings of the 15th ACM International Conference on Information and Knowledge Management, CIKM ’06, pages 102–111, New York, NY, USA. ACM. Cerca con Google

Yilmaz, E. and Aslam, J. A. (2008). Estimating Average Precision when Judgments are Incomplete. Knowledge and Information Systems, 16(2):173–211. Cerca con Google

Yilmaz, E., Aslam, J. A., and Robertson, S. (2008). A New Rank Correlation Coef cient for Information Retrieval. In Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’08, pages 587–594, New York, NY, USA. ACM. Cerca con Google

Yilmaz, E., Shokouhi, M., Craswell, N., and Robertson, S. (2010). Expected Browsing Utility for Web Search Evaluation. In Proceedings of the 19th ACM International Conference on Information and Knowledge Management, CIKM ’10, pages 1561–1564, New York, NY, USA. ACM. Cerca con Google

Yu, J., Tao, D., Wang, M., and Rui, Y. (2015). Learning to Rank Using User Clicks and Visual Features for Image Retrieval. IEEE Transactions on Cybernetics, 45(4):767–779. Cerca con Google

Zhang, E. and Zhang, Y. (2009). Eleven Point Precision-recall Curve, pages 981–982. Springer, Boston, MA, USA. Cerca con Google

Zhang, Y., Park, L. A. F., and Moffat, A. (2010). Click-based Evidence for Decaying Weight Distributions in Search Effectiveness Metrics. Information Retrieval, 13(1):46–69. Cerca con Google

Zobel, J. (1998). How Reliable Are the Results of Large-scale Information Retrieval Experiments? In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’98, pages 307–314, New York, NY, USA. ACM. Cerca con Google

Download statistics

Solo per lo Staff dell Archivio: Modifica questo record