Class Tagger

java.lang.Object
  extended byTagger

public class Tagger
extends java.lang.Object

This class starts the external Tagger and tokenizes its output. The word-tags are the basic information for further proceeding. KoelnDic, Analyse and the Generator are using these tags and are dependend on these tags. The used tagger here is the Stuttgart Tree-Tagger. For more information see: TODO: LINKS DES TAGGERS. The Tagset it uses is the STTS-Tagset. For more information see: TODO: LINKS DES TAGSETS.

Author:
Arthur Laub Arthur.Laub@urz.uni-hd.de Project: Kölschifier.

Field Summary
private static java.lang.String inputtoken
           
private static java.lang.String lemmatoken
           
private static int tag
           
 
Constructor Summary
Tagger()
           
 
Method Summary
private static int getTag(java.lang.String s)
          Converts the String with the Tag-Info of the Tagger to a Constant
static java.lang.String start(java.lang.String data)
          This method starts the external program treetagger.
static java.util.ArrayList tokenizer(java.lang.String tokenst)
          This Method tokenizes a specific string given by the external programm "TreeTagger".
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

inputtoken

private static java.lang.String inputtoken

tag

private static int tag

lemmatoken

private static java.lang.String lemmatoken
Constructor Detail

Tagger

public Tagger()
Method Detail

start

public static java.lang.String start(java.lang.String data)
This method starts the external program treetagger. The is started with a shell-skript as follows The path of the shell-skript is set in the String TAGGER_START_FILE. If the execution of the external tagger fails, a warning is printed out.

Parameters:
data - (The input-data from the internet)
Returns:
(String) taggeroutput

tokenizer

public static java.util.ArrayList tokenizer(java.lang.String tokenst)
This Method tokenizes a specific string given by the external programm "TreeTagger". The string is returned by the Method StartTagger. The output-String of the "Tree-Tagger" is tokenized into the inputtoken, the tagtoken, the inputlemma. The tagtoken is transformed by getTag. An ArrayList with the Element Word is initialized with Word(inputtoken, tag, lemmatoken, i) is called.

Parameters:
tokenst -
Returns:
ArrayList words (Type Word)

getTag

private static int getTag(java.lang.String s)
Converts the String with the Tag-Info of the Tagger to a Constant

Parameters:
s - with the Tag information as a String
Returns:
a Tag Constant