Jon Dehdari



A Primer for Localizing Link Grammar



Every so often I get asked "How do I start a Link Grammar project for my language?" This page seeks to answer that question.

Step 1: Read the Wikipedia article on Link Grammar

It's a nice, short introduction. Pay attention especially to the sections "Syntax" and "Examples", and make sure you understand everything in the example 1.

Step 2: Download and install the Link Grammar parser

If you use Debian or Ubuntu, you can just run this command instead: sudo apt-get install link-grammar link-grammar-dictionaries-en . If you encounter difficulty with this step, contact your local systems administrator. After installing it, you should be able to try it out with the English version, by running the command link-grammar (or ./link-parser), then type a small English sentence at the prompt.

Step 3: Copy the English grammar files, and adapt them

If you downloaded the .tar.gz file directly, the English grammar files are under the directory data, then en. If you downloaded link-grammar from your package manager (on Debian or Ubuntu), they are located at /usr/share/link-grammar/en/ . Copy the en directory to a new directory named the language you're working on. Now you should be able to run the link-grammar parser using this new language name: link-grammar your_new_language (or ./link-parser your_new_language).

The most important file that you will be dealing with is "4.0.dict" . Rename this existing (English) file to 4.0.dict.english (so you can refer to it later if you need to) and start a new one by the same name.

Step 4: Start a tiny grammar file

Begin by writing just two rules, so that you can parse a simple subject-verb sentence. Target a small sentence in your language like "horses run".

Remember the syntax section in the Wikipedia article? Now you'll be using that information.

Subject-Verb Languages

If your language has subjects before verbs (such as SVO or SOV), then you can construct a small grammar like this:
"horses.nnp" "cows.nnp" "lawyers.nnp":	% These are plural nouns
S+;

"run.vb" "sneeze.vb" "dance.vb":	% These are (intransitive) verbs
S- & {W-};

LEFT-WALL: W+;				% See below for an explanation
	
This says that nouns should look to the right (indicated by the "+") to form a subject link (S), and that verbs should look to the left to form a subject link. The W link connects to the top-most word in the sentence (usually an auxiliary verb, or if this is not present, a regular verb.

Now try parsing a sentence:

linkparser> horses run
Found 1 linkage (1 had no P.P. violations)
        Unique linkage, cost vector = (UNUSED=0 DIS=0 AND=0 LEN=1)

     +----S---+
     |        |
horses.nnp run.vb
	

Verb-Subject Languages

If your language has verbs before subjects (such as VSO), then you can construct a small grammar like this:
"horses.nnp" "cows.nnp" "lawyers.nnp":	% These are plural nouns
S-;

"run.vb" "sneeze.vb" "dance.vb":	% These are (intransitive) verbs
S+ & {W-};

LEFT-WALL: W+;				% See below for an explanation
	
This says that nouns should look to the left (indicated by the "-") to form a subject link (S), and that verbs should look to the right to form a subject link. The W link connects to the top-most word in the sentence (usually an auxiliary verb, or if this is not present, a regular verb.
linkparser> run horses
Found 1 linkage (1 had no P.P. violations)
    Unique linkage, cost vector = (UNUSED=0 DIS=0 AND=0 LEN=0)

    +----W---+----S---+
    |        |        |
LEFT-WALL run.vb horses.nnp
	

Step 5: Expand your grammar file

Now you can expand your grammar file by adding links involving objects, determiners, etc. I recommend that you put all your function words (a.k.a. stop words, such as "the", "you", "and") within the 4.0.dict file, and you put all your content words (nouns, verbs, adjectives, and adverbs) in separate files.

Dealing with Morphology

The link-grammar parser has a very basic mechanism for dealing with inflexional morphemes, but you might need to preprocess your input. One way to deal with morphologically-rich languages is to first use a real morphological analyzer to separate inflexional morphemes so that they look like separate words, then have your link-grammar localization link the morphemes together like you would for separate words.

Other Recommendations

I recommend that you always surround the entry for each word with double-quotes, and always add a part-of-speech tag as well. Thus the entry for horse would be "horse.nn" .

Use the square brackets [...] very sparingly.

Also keep a running list of each new link you create, at the top of the grammar file, with a short description, and maybe even a short example sentence or phrase. You'll eventually forget what some of the links are there for.

You can create a file containing all the sentences that you have already successfully parsed, one sentence per line. Then when you add a new link, you can re-run this file to make sure your new link doesn't break other constructions. This is similar to a regression test in software engineering. The link-grammar parser accepts standard shell input techniques like link-grammar your_new_language < test_sentences.txt

A general bit of advice, backup your 4.0.dict file often, and occasionally to a different physical locations.

The developers of the Link Grammar parser have generously released the parser under a Free/open source license. Consider doing the same, so that other speakers of your language can benefit.

While this doesn't cover everything, it should get you started. Happy hacking!