We propose a hybrid architecture for high quality machine translation which combines the strengths of both approaches and minimizes their weaknesses: At the core is a rule-based MT system which provides morphology, declarative grammars, semantic categories, and small dictionaries, but which avoids all expensive kinds of intellectual knowledge acquisition. Instead of manually working out large dictionaries and compiling information on disambiguation preference, we suggest a novel corpus-based bootstrapping method for automatically expanding dictionaries, and for training the analytical performance and the choice of transfer alternatives.
This is a Marie Curie FP7 project in collaboration with Lingenio, Heidelberg, a small company developing and selling rule based MT systems (Translate) for English/German/French (Spanish and Italian under development) and also Office Dictionaries based on the context sensitive Intellidict technology. The underlying technology was originally developed at the IBM Heidelberg research centre in a long term project.
Contact: Bogdan Babych