Dr Serge Sharoff

Dr Serge Sharoff

Senior Lecturer in Translation Studies


Summary: Corpus linguistics, especially collecting large corpora from the Web; computational linguistics; natural language processing; computer-assisted language learning

Location: Parkinson 1.15A

Dr Serge Sharoff is a Senior Lecturer in Translation Studies, SMLC. His research focuses on natural language processing and computer-assisted language learning, including automated methods for collecting corpora from the web, their analysis in terms of domains and genres and extraction of lexicons and terminology. He is one of the designers of the Russian National Corpus and other corpora, as well as frequency dictionaries of modern Russian.

Research interests

His research interests are related to three domains: linguistics (primarily computational linguistics and corpus linguistics), cognitive science and communication studies.

Probably the most interesting bit in his recent research is semi-automatic acquisition of representative corpora from the web and their automatic annotation, cf. the set of available corpora and the procedure described at http://corpus.leeds.ac.uk/internet.html. The current set of resources includes 100-200 million word corpora for Chinese, English, French, German, Italian, Polish, Portuguese, Russian and Spanish.

More information about his research and publications is available from his homepage: http://corpus.leeds.ac.uk/serge/

Research students

Currently he supervises a number of PhD students working on a range of projects:

  • Marilena di Bari working on the use linguistic knowledge for automatic sentiment and emotion analysis;
  • Muhamad Alif Haji Sismat working on application  of computer-assisted translation (CAT) tools in teaching translation;
  • Noushin Rezapour Asheghi working on automatic genre identification in web corpora;
  • Valentina Ragni working on the use of reverse subtitling for language learning, in particular using eye-tracking methods;
  • Mika Takewa working on analysis of translation shifts using Systemic Functional Linguistics;
  • Jun Yang working on Collaborative translation in a crowd-sourcing environment;
  • David Yu Yuan working on automated quality assessment for trainee translators;

New PhD students are welcome to apply in these and related areas, primarily concerning applications of technologies in computational linguistics for translation and language learning.

Some of the former students

  • Dragos Ciobanu and Svitlana Babych, both working on projects on Computer-Assisted Language Learning for students having knowledge of a cognate foreign language;
  • Badeeah Hassanain working on designing a Controlled Language for Machine Translation on the basis of Systemic Functional Linguistics;
  • Marco Brunello working on application of documents classification and similarity methods for improving the quality of machine translation;
  • Vivian Xu Ran working on terminology preparation for simultaneous interpreters.
  • Alina Secara working on the use of eye-tracking for studying creative spelling in fansubbing;


He is involved in teaching the following modules:

  • MODL5000: Computer-Assisted Translation
  • MODL5001: Methods and Approaches in Translation Studies
  • MODL5005: Translators and the Computer
  • MODL5007: Corpus Linguistics for Translators
  • MODL5009: English for Translators