Learning from Light-Weight Constraints on Translation Quality
Summary: The project is concerned with the topic of learning from light-weight constraints on machine translation outputs, addressing the possibilities of an integration of external knowledge sources as constraints into machine translation. Such constraints can exist in form of (weighted) markings of translation errors that are collected in previous user interactions, or they can consist of terminology/domain constraints that are extracted from domain-specific nearest neighbor data or from document-level side constraints. In both cases, the learning signal is direct and explicit, but the feedback process is lightweight by collecting error markings instead of user corrections, and by extracting terminology constraints from user-provided nearest neighbor data instead of from user-provided term dictionaries. Initial candidates for supervised learning algorithms do exist and shall be extended in the early phase of the project. Furthermore, in the early phase of the project, constraint signals that are available in the form of domain-specific data or in the form of simulated user feedback shall be used. In more advanced phases of the project, new algorithms shall be developed, and new knowledge sources shall be tapped into.