Class Tagger
java.lang.Object
Tagger
- public class Tagger
- extends java.lang.Object
This class starts the external Tagger and tokenizes its output.
The word-tags are the basic information for further proceeding.
KoelnDic, Analyse and the Generator are using these tags and are dependend on these tags.
The used tagger here is the Stuttgart Tree-Tagger.
For more information see: TODO: LINKS DES TAGGERS.
The Tagset it uses is the STTS-Tagset.
For more information see: TODO: LINKS DES TAGSETS.
- Author:
- Arthur Laub
Arthur.Laub@urz.uni-hd.de
Project: Kölschifier.
Field Summary |
private static java.lang.String |
inputtoken
|
private static java.lang.String |
lemmatoken
|
private static int |
tag
|
Method Summary |
private static int |
getTag(java.lang.String s)
Converts the String with the Tag-Info of the Tagger to a Constant |
static java.lang.String |
start(java.lang.String data)
This method starts the external program treetagger.
|
static java.util.ArrayList |
tokenizer(java.lang.String tokenst)
This Method tokenizes a specific string given by the external programm "TreeTagger".
|
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
inputtoken
private static java.lang.String inputtoken
tag
private static int tag
lemmatoken
private static java.lang.String lemmatoken
Tagger
public Tagger()
start
public static java.lang.String start(java.lang.String data)
- This method starts the external program treetagger.
The is started with a shell-skript as follows
The path of the shell-skript is set in the String TAGGER_START_FILE.
If the execution of the external tagger fails, a warning is printed out.
- Parameters:
data
- (The input-data from the internet)
- Returns:
- (String) taggeroutput
tokenizer
public static java.util.ArrayList tokenizer(java.lang.String tokenst)
- This Method tokenizes a specific string given by the external programm "TreeTagger".
The string is returned by the Method StartTagger.
The output-String of the "Tree-Tagger" is tokenized into the inputtoken, the tagtoken, the inputlemma.
The tagtoken is transformed by getTag.
An ArrayList with the Element Word is initialized with Word(inputtoken, tag, lemmatoken, i) is called.
- Parameters:
tokenst
-
- Returns:
- ArrayList words (Type Word)
getTag
private static int getTag(java.lang.String s)
- Converts the String with the Tag-Info of the Tagger to a Constant
- Parameters:
s
- with the Tag information as a String
- Returns:
- a Tag Constant