Notice
This item was automatically migrated from a legacy system. It's data has not been checked and might not meet the quality criteria of the present system.
Roda, G., Zenz, V., Lupu, M., Järvelin, K., Sanderson, M., & Womser-Hacker, C. (2009). So many topics, so little time. ACM SIGIR Forum, 43(1), 9–16. https://doi.org/10.1145/1670598.1670601
E191-01 - Forschungsbereich Cyber-Physical Systems E020-04 - Fachbereich High Performance Computing E194-01 - Forschungsbereich Software Engineering
-
Date (published):
2009
-
Number of Pages:
8
-
Peer reviewed:
No
-
Keywords:
Hardware and Architecture; Management Information Systems; evaluation; data management; big data; data curation; Information retrieval
-
Abstract:
In the context of creating large scale test collections, the present paper discusses methods of constructing a patent test collection for evaluation of prior art search. In particular, it addresses criteria for topic selection and identification of recall bases. These issues arose while organizing the CLEF-IP evaluation track and were the subject of an online discussion among the track's organizer...
In the context of creating large scale test collections, the present paper discusses methods of constructing a patent test collection for evaluation of prior art search. In particular, it addresses criteria for topic selection and identification of recall bases. These issues arose while organizing the CLEF-IP evaluation track and were the subject of an online discussion among the track's organizers and its steering committee. Most literature on building test collections is concerned with minimizing the costs of obtaining relevance assessments. CLEF-IP can afford to have large topics sets since relevance assessments are generated by exploiting existing manually created information. In a cost-benefit analysis, the only issue seems to be the computing time required by participants to run (tens or hundreds of) thousands of queries. This document describes the data sets and decisions leading to the creation of the CLEF-IP collection.