E194 - Institut für Information Systems Engineering
-
Date (published):
2017
-
Number of Pages:
121
-
Keywords:
Patent Searching; Query Term Expansion; Query Log Analysis; Lexical Resouces
en
Abstract:
A patent document is a legal title granting its holder the exclusive right to make use of an invention for a limited area and time by stopping others from making, using or selling it without authorization. In preparing a patent application or judging the validity of an applied patent based on novelty and inventiveness, an essential task is searching patent databases for related patents that may invalidate the invention. This task is usually performed by examiners in a patent office and patent searchers in private companies. Virtually all search systems of the patent offices and commercial operators process Boolean queries as these guarantee repeatability and allow clear tracking of results obtained. But despite the importance of Boolean retrieval, there is not much work in current research assisting patent experts in formulating such queries. Currently, these approaches are mostly limited to the usage of standard dictionaries, such as WordNet, or lexica, like Wikipedia, to provide synonymous expansion terms. But the highly specific vocabulary used in the settings of patent applications, where patent applicants are permitted to be their own lexicographers, is not included in these standard dictionaries. In this thesis we investigate the problem of query term expansion (QTE) in the query generation step of patent searching with the goal of suggesting relevant expansion terms, in particular synonyms and equivalents, to a query term in a semi-automatic or fully automatic manner for Boolean retrieval. The first goal of this thesis is to analyse query logs of patent experts to gain insights into the search behaviour and characteristic of patent expert’s queries. We use actual query logs of patent examiners of the United States Patent and Trademark Office (USPTO). We show that query generation in patent searching is highly domain specific and that the queries posed by the patent examiners can be valuable resources to provide lexical knowledge for the patent domain. The second contribution of this thesis is to extract lexical knowledge from the query logs to support QTE in patent searching. We detect keyword phrases and synonyms from the query logs based on Boolean and proximity operators in the text queries and build US class-specific, class-related and class-independent lexical databases from the query expansion sessions of patent examiners at the USPTO. We then show that the lexical databases can support patent searchers in the query generation process, in particular in formulating Boolean queries. The third contribution of this thesis is to improve precision in suggesting expansion terms in a semi-automatic or fully automatic manner by ranking the expansion terms. For that we consider the US patent classification, frequencies of the expansion terms, and the word senses. We perform an evaluation of our proposed query term expansion approach on real query sessions of patent examiners. Results show that the proposed domain-specific lexical databases achieve significantly better results than the baseline and other enhanced query term expansion approaches. Finally, we study the impact of QTE using synonyms on patent document retrieval. Experiments on the CLEF-IP 2010 benchmark dataset show that automatic query expansion using synonyms from USPTO patent examiners tends to decrease or only slightly improve the retrieval effectiveness, with no significant improvement. But an analysis of the retrieval results shows that synonym expansion does not have generally a negative effect on the retrieval effectiveness. Recall is drastically improved for query topics, where the baseline queries achieve, on average, only low recall values. So the approach is a valuable QTE method for search systems, which support repeatability and allow tracking of the results, in particular for the search systems used by the patent offices and commercial operators. So we recommend using PatNet as a lexical resource for semi-automatic QTE in Boolean patent retrieval, where synonym expansion is particularly common to improve recall.
en
Additional information:
Abweichender Titel nach Übersetzung der Verfasserin/des Verfassers