DetLemma Class Reference
[LeJa]

putting the pieces together for determining the lemma More...

#include <DetLemma.h>

List of all members.

Public Member Functions

DetLemma ()

default constructor

~DetLemma ()

destructor

bool fillFlex (char *)

initializes the levels from XML file

bool detLemma (string &, JXDictionary *, list< result_lemma > &, FILE *)

this function creates HypoLemma objects for possible segmentation of the hiragana ending

bool confirmLemma (string &, string &, JXDictionary *, result_lemma &)

searches a hypothetical lemma in the dictionary

bool findInLevel (string &, UINT)

searches a hiragana string in a specific level

bool insertFlextoLevel (FlexLevel *, int)

inserts a single Flexlevel into vector m_flexlevel

void HypoPrint ()

prints all hypothesis to stdout

bool insertPossibleHypos (JVerbalFlexion *jvf, UMString hirastr, int j)

this function adds on to the possible reduction hypothesis

Private Types

typedef vector< FlexLevel
* >::iterator IT

typedef vector< HypoLemma
>::iterator HIT

Private Member Functions

int getHighestLevel (char *)

finds the highest level in grammar

Private Attributes

vector< FlexLevel * > m_flexlevel

vector< HypoLemma > m_hypothesen

int m_number_of_levels

int m_number_of_entries

int m_flexmaxlength

int m_level_counter [MAXFLEX]

list< result_lemma > m_result_list

string m_cur_reqform

string m_cur_reqflexart

JVerbalFlexion * hypo

Detailed Description

putting the pieces together for determining the lemma

Author:: Iris Vogel iris@urz.uni-heidelberg.de

Date:: Aug 2003

Version:: 0.8/15

See also:: DetLemma.h

Member Typedef Documentation

typedef vector<HypoLemma>::iterator DetLemma::HIT [private]

iterator for hyothesis vector

typedef vector<FlexLevel*>::iterator DetLemma::IT [private]

iterator for the flexation levels

Constructor & Destructor Documentation

DetLemma::DetLemma ( )

default constructor

DetLemma::~DetLemma ( )

destructor
iterates through the map of flexlevels and deletes each one

Member Function Documentation

bool DetLemma::confirmLemma ( string & lemma,

string & flexart,

JXDictionary * jxdic,

result_lemma & rl

)

searches a hypothetical lemma in the dictionary
by not only checking the lemma-string but also the kind of flexation, results are good.
however, there can be more than one result for the same lookup, especially when the initial
string is an all-hiragana string

Parameters:

lemma lemma in question

flexart kind of flexation

jxdic pointer JXDictionary class for lookup

rl variable for returning the result, if lookup succeeded

Returns:
true if successful
false if unsuccessful

bool DetLemma::detLemma ( string & sstring,

JXDictionary * jxdic,

list< result_lemma > & result_list,

FILE * logfile

)

this function creates HypoLemma objects for possible segmentation of the hiragana ending
This is the heart of lemmatization. First there is the hiragana string,
every possible way to split it results in a new hypothesis about stem and ending.
For every hypothesis the ending is searched in the grammar, starting at the highest level.
Once an ending is found, it gets cut of and the rest of the hiragana string is
treated again as above. Once all the levels of flexation and hypothesis are checked,
the dictionary form ending of the verb is selcted from the grammar, agglutinated and checked
against the dictionary. Also the kind of verb associated with the grammatical rules is checked
to make sure that similar forms like passiv and potential are separated if possible.

Parameters:

sstring full verbal string to be lemmatized

jxdic dictionary class pointer for lookup

result_list list of (references) to result of lemmatization and lookup

Returns:
true if lookup of possible lemmas succeeded
false if lookup possible lemmas failed

Todo:
it would be good (save a lot of .GetStringUC()-calls) to overload this
function with a UNICODE string oder wchar_t*
instead of the multibyte string

bool DetLemma::fillFlex ( char * fname )

initializes the levels from XML file
this function is responsible for the grammar input
the grammatical verbforms are hardcoded, which is no big deal if you stick to
Japanese, because the numbers and kinds are very much fixed
it prints a list of the loaded Flexlevels to stdout

Parameters:

fname name of the grammar file

Returns:
true, if succesful

bool DetLemma::findInLevel ( string & end,

UINT level

)

searches a hiragana string in a specific level

Parameters:

end hiragana ending in question

level level that is being searched

Returns:
true if successful
false if unsuccessful

int DetLemma::getHighestLevel ( char * ) [private]

finds the highest level in grammar
internal funktion to determine how many levels

void DetLemma::HypoPrint ( )

prints all hypothesis to stdout

bool DetLemma::insertFlextoLevel ( FlexLevel * fl,

int pos

)

inserts a single Flexlevel into vector m_flexlevel

Parameters:

fl : name of the grammar file

pos : name for

Returns:
true, if succesful

bool DetLemma::insertPossibleHypos ( JVerbalFlexion * jvf,

UMString hirastr,

int j

)

this function adds on to the possible reduction hypothesis
this fuction is called when changes to the hiragana string occured,
especially when an ending has been recognized and deleted from the
string. the new, shorter string is being split into parts again,
and for each new possibilty an hypothesis is added

Parameters:

jvf flexation class of hypothesis in question

jhirastr Hiragana string that is going to split at point specified by j

j splitting point for Hiragana string

Returns:
true if successful
false if unsuccessful

Member Data Documentation

JVerbalFlexion* DetLemma::hypo [private]

pointer for processing current hypothesis

string DetLemma::m_cur_reqflexart [private]

variable for processing current hypothesis

string DetLemma::m_cur_reqform [private]

variable for processing current hypothesis

vector<FlexLevel*> DetLemma::m_flexlevel [private]

the flexation levels of the grammar file are stored here

int DetLemma::m_flexmaxlength [private]

variable, that defines the maximum length of an grammatical unit of hiragana

vector<HypoLemma> DetLemma::m_hypothesen [private]

hypthesis for lemma resolution are stored here

int DetLemma::m_level_counter[MAXFLEX] [private]

array storing the numbers corresponding to the flexlevels

int DetLemma::m_number_of_entries [private]

variable, that defines the number of entries

int DetLemma::m_number_of_levels [private]

variable, that stores the number of levels found in the grammar file

list<result_lemma> DetLemma::m_result_list [private]

since sometimes there is more than one resolution, results are stored in a list

The documentation for this class was generated from the following files:

Generated on Mon Aug 18 19:27:10 2003 for LeJa by

1.3.3


Public Member Functions
	DetLemma ()
	default constructor
	~DetLemma ()
	destructor
bool	fillFlex (char *)
	initializes the levels from XML file
bool	detLemma (string &, JXDictionary , list< result_lemma > &, FILE )
	this function creates HypoLemma objects for possible segmentation of the hiragana ending
bool	confirmLemma (string &, string &, JXDictionary *, result_lemma &)
	searches a hypothetical lemma in the dictionary
bool	findInLevel (string &, UINT)
	searches a hiragana string in a specific level
bool	insertFlextoLevel (FlexLevel *, int)
	inserts a single Flexlevel into vector m_flexlevel
void	HypoPrint ()
	prints all hypothesis to stdout
bool	insertPossibleHypos (JVerbalFlexion *jvf, UMString hirastr, int j)
	this function adds on to the possible reduction hypothesis
Private Types
typedef vector< FlexLevel * >::iterator	IT
typedef vector< HypoLemma >::iterator	HIT
Private Member Functions
int	getHighestLevel (char *)
	finds the highest level in grammar
Private Attributes
vector< FlexLevel * >	m_flexlevel
vector< HypoLemma >	m_hypothesen
int	m_number_of_levels
int	m_number_of_entries
int	m_flexmaxlength
int	m_level_counter [MAXFLEX]
list< result_lemma >	m_result_list
string	m_cur_reqform
string	m_cur_reqflexart
JVerbalFlexion *	hypo

DetLemma Class Reference [LeJa]

Public Member Functions

Private Types