Cross-lingual Learning-to-Rank for Patent Retrieval
Summary: Prior art search is an important tool to determine a patent’s novelty and to avoid patent infringement. The task involves two problems, patent translation and patent retrieval, that need to be solved in multiple languages. Because of a highly specialized jargon and a multitude of patent domains, both tasks are considered difficult on their own. While most previous approaches have addressed translation and search as separate problems, we propose a synergetic combination of patent translation and patent search in a well-defined machine learning framework. Patent search is defined as a monolingual learning-to-rank problem that optimizes the ranking of prior art patents for patent queries. Patent translation is defined as a multi-task learning problem that optimizes translation quality across multiple patent domains. The translation system utilizes patent search by directly incorporating a translation’s contribution to search quality in optimization of translation parameters. The goal of the project is to show the mutual benefit of this integration of translation and search.