Statistical language modeling, which provides probabilities for linguistic utterances, is a vital component in machine translation, automatic speech recognition, information retrieval, and many other language technologies. In this seminar we will discuss different types of language models and how they are used in various applications. The language models include n-gram- (including various smoothing techniques), skip-, class-, factored-, topic-, and neural-network-based approaches. We also will look at how these perform on different language typologies and at different scales of training set sizes.
This seminar will be followed by a project seminar where you will work in small groups to identify a shortcoming of an existing language model, make a novel modification to overcome the shortcoming, compare experimental results of your new method with the existing baseline, and discuss the results. It'll be fun.