This course gives an overview of the Moses machine translation system. Since the audience for this class is both computational linguists and translators, we will cover both the theory of how it works as well as how to use it in everyday work.
Note that the installation of Moses can be quite difficult, and it may take a while to get the software running on your computer. Be patient, and remove all hammers from the vicinity of your computer!
echo 'alias cyg-get="/cygdrive/c/cygwin64/setup-x86_64.exe -q -P "' >> ~/.bashrc source ~/.bashrcThen install the following packages, thus:
cyg-get gcc-core, gcc-g++, git, libboost_thread, libboost_system, libboost-devel, wget, zip, unzip, graphviz, imagemagick, gv, make, cmake, automake, nanoNow, download and install Moses:
cd wget -c http://www.statmt.org/moses/RELEASE-3.0/binaries/cygwin-64bit/cygwin-64bit.tgz tar zxvf cygwin-64bit.tgz mv cygwin-64bit mosesFor Mac OS X (Yosemite):
cd curl -O http://www.statmt.org/moses/RELEASE-3.0/binaries/macosx-yosemite/macosx-yosemite.tgz tar zxvf macosx-yosemite.tgz mv macosx-yosemite mosesFor Linux (or FreeBSD):
cd wget -c http://www.statmt.org/moses/RELEASE-3.0/binaries/linux-64bit/linux-64bit.tgz tar zxvf linux-64bit.tgz mv linux-64bit moses
cd moses wget -c http://www.statmt.org/moses/download/sample-models.tgz tar zxvf sample-models.tgz cd sample-models ~/moses/bin/moses -f phrase-model/moses.ini < phrase-model/in | tee outThe above command will take the sentence in the file in, which is »das ist ein kleines haus«, and translate it to English "this is a small house" in the file out .
mkdir ~/corpora
cd ~/corpora
wget -c http://www.statmt.org/wmt15/training-parallel-nc-v10.tgz wget -c http://www.statmt.org/wmt15/dev-v2.tgz
tar zxvf training-parallel-nc-v10.tgz tar zxvf dev-v2.tgz
~/moses/scripts/tokenizer/tokenizer.perl -l de \ < news-commentary-v10.de-en.de \ > news-commentary-v10.de-en.tok.de ~/moses/scripts/tokenizer/tokenizer.perl -l en \ < news-commentary-v10.de-en.en \ > news-commentary-v10.de-en.tok.en
~/moses/scripts/recaser/train-truecaser.perl \ --model truecase-model.de \ --corpus news-commentary-v10.de-en.tok.de ~/moses/scripts/recaser/train-truecaser.perl \ --model truecase-model.en \ --corpus news-commentary-v10.de-en.tok.en
~/moses/scripts/recaser/truecase.perl \ --model truecase-model.de \ < news-commentary-v10.de-en.tok.de \ > news-commentary-v10.de-en.tok.truecase.de ~/moses/scripts/recaser/truecase.perl \ --model truecase-model.en \ < news-commentary-v10.de-en.tok.en \ > news-commentary-v10.de-en.tok.truecase.en
cd ~/moses wget https://github.com/moses-smt/mgiza/archive/master.zip mv master.zip mgiza.zip
unzip mgiza.zip mv mgiza-master mgiza cd mgiza/mgizapp
cmake . make -j 4 cp scripts/*.{sh,py,pl} bin/
elected parliaments do not own our liberties . NULL ({ }) gewählte ({ 1 }) Parlamente ({ 2 }) besitzen ({ 5 }) unsere ({ 6 }) Freiheiten ({ 7 }) nicht ({ 3 4 }) . ({ 8 })
cd ~/corpora wget http://jon.dehdari.org/teaching/uds/moses/moses_ems_nc10.conf
nano moses_ems_nc10.conf
~/moses/scripts/ems/experiment.perl -config moses_ems_nc10.conf
nice ~/moses/scripts/ems/experiment.perl -config moses_ems_nc10.conf -exec