Main Page | Modules | Namespace List | Compound List | File List | Compound Members | File Members | Related Pages

DetLemma Class Reference
[LeJa]

putting the pieces together for determining the lemma More...

#include <DetLemma.h>

List of all members.

Public Member Functions

 DetLemma ()
 default constructor

 ~DetLemma ()
 destructor

bool fillFlex (char *)
 initializes the levels from XML file

bool detLemma (string &, JXDictionary *, list< result_lemma > &, FILE *)
 this function creates HypoLemma objects for possible segmentation of the hiragana ending

bool confirmLemma (string &, string &, JXDictionary *, result_lemma &)
 searches a hypothetical lemma in the dictionary

bool findInLevel (string &, UINT)
 searches a hiragana string in a specific level

bool insertFlextoLevel (FlexLevel *, int)
 inserts a single Flexlevel into vector m_flexlevel

void HypoPrint ()
 prints all hypothesis to stdout

bool insertPossibleHypos (JVerbalFlexion *jvf, UMString hirastr, int j)
 this function adds on to the possible reduction hypothesis


Private Types

typedef vector< FlexLevel
* >::iterator 
IT
typedef vector< HypoLemma
>::iterator 
HIT

Private Member Functions

int getHighestLevel (char *)
 finds the highest level in grammar


Private Attributes

vector< FlexLevel * > m_flexlevel
vector< HypoLemmam_hypothesen
int m_number_of_levels
int m_number_of_entries
int m_flexmaxlength
int m_level_counter [MAXFLEX]
list< result_lemmam_result_list
string m_cur_reqform
string m_cur_reqflexart
JVerbalFlexionhypo


Detailed Description

putting the pieces together for determining the lemma

Author:
Iris Vogel iris@urz.uni-heidelberg.de
Date:
Aug 2003

Version:
0.8/15
See also:
DetLemma.h


Member Typedef Documentation

typedef vector<HypoLemma>::iterator DetLemma::HIT [private]
 

iterator for hyothesis vector

typedef vector<FlexLevel*>::iterator DetLemma::IT [private]
 

iterator for the flexation levels


Constructor & Destructor Documentation

DetLemma::DetLemma  ) 
 

default constructor

DetLemma::~DetLemma  ) 
 

destructor

iterates through the map of flexlevels and deletes each one


Member Function Documentation

bool DetLemma::confirmLemma string &  lemma,
string &  flexart,
JXDictionary jxdic,
result_lemma rl
 

searches a hypothetical lemma in the dictionary

by not only checking the lemma-string but also the kind of flexation, results are good.

however, there can be more than one result for the same lookup, especially when the initial

string is an all-hiragana string

Parameters:
lemma lemma in question
flexart kind of flexation
jxdic pointer JXDictionary class for lookup
rl variable for returning the result, if lookup succeeded
Returns:
true if successful

false if unsuccessful

bool DetLemma::detLemma string &  sstring,
JXDictionary jxdic,
list< result_lemma > &  result_list,
FILE *  logfile
 

this function creates HypoLemma objects for possible segmentation of the hiragana ending

This is the heart of lemmatization. First there is the hiragana string,

every possible way to split it results in a new hypothesis about stem and ending.

For every hypothesis the ending is searched in the grammar, starting at the highest level.

Once an ending is found, it gets cut of and the rest of the hiragana string is

treated again as above. Once all the levels of flexation and hypothesis are checked,

the dictionary form ending of the verb is selcted from the grammar, agglutinated and checked

against the dictionary. Also the kind of verb associated with the grammatical rules is checked

to make sure that similar forms like passiv and potential are separated if possible.

Parameters:
sstring full verbal string to be lemmatized
jxdic dictionary class pointer for lookup
result_list list of (references) to result of lemmatization and lookup
Returns:
true if lookup of possible lemmas succeeded

false if lookup possible lemmas failed

Todo:
it would be good (save a lot of .GetStringUC()-calls) to overload this
function with a UNICODE string oder wchar_t*

instead of the multibyte string

bool DetLemma::fillFlex char *  fname  ) 
 

initializes the levels from XML file

this function is responsible for the grammar input

the grammatical verbforms are hardcoded, which is no big deal if you stick to

Japanese, because the numbers and kinds are very much fixed

it prints a list of the loaded Flexlevels to stdout

Parameters:
fname name of the grammar file
Returns:
true, if succesful

bool DetLemma::findInLevel string &  end,
UINT  level
 

searches a hiragana string in a specific level

Parameters:
end hiragana ending in question
level level that is being searched
Returns:
true if successful

false if unsuccessful

int DetLemma::getHighestLevel char *   )  [private]
 

finds the highest level in grammar

internal funktion to determine how many levels

void DetLemma::HypoPrint  ) 
 

prints all hypothesis to stdout

bool DetLemma::insertFlextoLevel FlexLevel fl,
int  pos
 

inserts a single Flexlevel into vector m_flexlevel

Parameters:
fl : name of the grammar file
pos : name for
Returns:
true, if succesful

bool DetLemma::insertPossibleHypos JVerbalFlexion jvf,
UMString  hirastr,
int  j
 

this function adds on to the possible reduction hypothesis

this fuction is called when changes to the hiragana string occured,

especially when an ending has been recognized and deleted from the

string. the new, shorter string is being split into parts again,

and for each new possibilty an hypothesis is added

Parameters:
jvf flexation class of hypothesis in question
jhirastr Hiragana string that is going to split at point specified by j
j splitting point for Hiragana string
Returns:
true if successful

false if unsuccessful


Member Data Documentation

JVerbalFlexion* DetLemma::hypo [private]
 

pointer for processing current hypothesis

string DetLemma::m_cur_reqflexart [private]
 

variable for processing current hypothesis

string DetLemma::m_cur_reqform [private]
 

variable for processing current hypothesis

vector<FlexLevel*> DetLemma::m_flexlevel [private]
 

the flexation levels of the grammar file are stored here

int DetLemma::m_flexmaxlength [private]
 

variable, that defines the maximum length of an grammatical unit of hiragana

vector<HypoLemma> DetLemma::m_hypothesen [private]
 

hypthesis for lemma resolution are stored here

int DetLemma::m_level_counter[MAXFLEX] [private]
 

array storing the numbers corresponding to the flexlevels

int DetLemma::m_number_of_entries [private]
 

variable, that defines the number of entries

int DetLemma::m_number_of_levels [private]
 

variable, that stores the number of levels found in the grammar file

list<result_lemma> DetLemma::m_result_list [private]
 

since sometimes there is more than one resolution, results are stored in a list


The documentation for this class was generated from the following files:
Generated on Mon Aug 18 19:27:10 2003 for LeJa by doxygen 1.3.3