Jon Dehdari






Universität des Saarlandes




Graduate Seminar on
Statistical Language Modeling

Instructor: Dr. Jon Dehdari

Winter 2014–2015

Statistical language modeling, which provides probabilities for linguistic utterances, is a vital component in machine translation, automatic speech recognition, information retrieval, and many other language technologies. In this seminar we will discuss different types of language models and how they are used in various applications. The language models include n-gram- (including various smoothing techniques), skip-, class-, factored-, topic-, and neural-network-based approaches. We also will look at how these perform on different language typologies and at different scales of training set sizes.

This seminar will be followed by a project seminar where you will work in small groups to identify a shortcoming of an existing language model, make a novel modification to overcome the shortcoming, compare experimental results of your new method with the existing baseline, and discuss the results. It'll be fun.

Syllabus

Topics

  1. Overview of Language Models, including n-gram models
  2. Cache and Skip Language Models
  3. Factored Language Models
  4. Sentence Mixture Models
  5. PLSA/Topic-based Language Models: A B
  6. Bilingual Language Models
  7. Feedforward Neural Network Language Models: Derivatives: A B
  8. Recurrent (viz. Elman) Neural Network Language Models
  9. Big Language Models

Assignments

  1. Assignment 1
  2. Assignment 2

External Links