Main Page | Modules | Namespace List | Compound List | File List | Compound Members | File Members | Related Pages

UMString Class Reference
[LeJa]

String class for handling multibyte and unicode strings. More...

#include <UMString.h>

List of all members.

Public Member Functions

 UMString ()
 Default Constructor for the UMString-Class.

 UMString (const UMString &tmp_umstring)
 Constructor for the UMString-Class which takes a UMString object as init-value.

 UMString (const char *tmp_mb_string)
 UMString Class Constructor for a Multibyte string passed at creation of an UMString Object.

 UMString (const wstring tmp_uc_string)
 UMString Class Constructor for Unicode string (wstring) passed at creation.

 UMString (const wchar_t *tmp_uc_string)
 UMString Class Constructor for Unicode string passed at creation.

 UMString (const string tmp_stlstring)
 UMString Class Constructor for STL-String passed at creation.

 ~UMString ()
 Destructor for UMString Class.

UINT GetLengthUC () const
 Indexed array access to the stored Unicode-string Reads out the number of characters in the stored Unicode string.

UINT GetLengthMB () const
 Reads out the number of characters in the stored Multibyte string.

wchar_t * PushString (const wchar_t *tmp_uc_string)
 Loads a Unicode string into the UMString object.

char * PushString (const char *tmp_mb_string)
 Loads a Multibyte string into the UMString object.

wchar_t * GetStringUC () const
 Returns the Unicode string to the calling function.

char * GetStringMB () const
 Returns the Multibyte string to the calling function.

void Clear ()
 Clears all stored string data in the object.

bool ReplaceSubstring (const UINT First, const UINT Last, const UMString NewString)
 Replaces the substring beginning at character "First" up to character "Last".

bool SI_Set (const UINT x)
 Sets the "sliding index" by passing an absolute position.

UINT SI_Get () const
 Gets the "sliding index"-position (0-based).

UINT SI_Forward ()
 Sets the sliding index to next character.

UINT SI_Backward ()
 Sets the sliding index to previous character.

bool SI_NextHira ()
 Sets the "sliding index" to the next hiragana-character in the string after the current "sliding index"-position.

bool SI_NextKata ()
 Sets the "sliding index" to the next katakana-character in the string after the current "sliding index"-position.

bool SI_NextKanji ()
 Sets the "sliding index" to the next kanji-character in the string after the current "sliding index"-position.

bool SI_NextDifferent ()
 Sets the "sliding index" to the next character, which is of different CHARTYPE compared to the current.

bool SI_PrevHira ()
 Sets the "sliding index" to the previous hiragana-character in the string before the current "sliding index"-position.

bool SI_PrevKata ()
 Sets the "sliding index" to the previous katakana-character in the string before the current "sliding index"-position.

bool SI_PrevKanji ()
 Sets the "sliding index" to the previous kanji-character in the string before the current "sliding index"-position.

bool SI_PrevDifferent ()
 Sets the "sliding index" to the next left character, which is of different CHARTYPE compared to the current.

bool SI_TokenStart ()
 Sets the sliding index pointer to the beginning of the current token.

bool SI_TokenEnd ()
 Sets the sliding index pointer to the end of the current token.

const UMString SI_GetToken ()
 Gets the next token consisting of characters like the one the sliding index points to.

const UMString SI_GetTokenPosAfter ()
 gets the next token

const UMString SI_GetRemaining () const
 Gets the string remaining after the "sliding index" position.

const UMString SI_GetSizedString (const UINT first, const UINT last) const
 gets a slice of the string

const wchar_t SI_GetChar () const
 Returns the Unicode-Character at the sliding index position.

const wchar_t SI_GetCharNext () const
 Returns the Unicode-Character after the sliding index position.

void SI_SetChar (const wchar_t tmp_uc_char)
 Replaces the Unicode-Character at the sliding index position.

int SI_GetPosChar (wchar_t &ucchar, UINT offset=0)
 searches for a char in string, starting at an offset position

bool SI_SetPosChar (wchar_t &ucchar)
 sets the sliding index to the next occurance of uuchar

bool SI_SplitUMString (UMString &, UMString &, const UINT)
 splits the UMString into two at position x

const UMStringoperator= (const UMString &right)
 Copy-Constructor for the UMString Class.

bool operator== (const UMString &right)
 Comparison operator for UMString class.

const UMString operator+ (const UMString &right)
 Concatenation operator for UMString class.

 operator const char * () const
 casting to const char

 operator const wchar_t * () const
 casting to const wchar_t

 operator const string () const
 casting to const string

 operator char * ()
 casting to char

 operator wchar_t * ()
 casting to w_char

 operator string ()
 casting to string


Private Member Functions

void MB2UC ()
 Synchronizes the MultiByte and the Unicode string.

void UC2MB ()
 Synchronizes the MultiByte and the Unicode string.


Private Attributes

wchar_t * uc_string
char * mb_string
UINT si


Detailed Description

String class for handling multibyte and unicode strings.

Version:
1.0
Date:
August 2002-2003
Author:
Torben Pastuch > superbaer@t-online.de < (slightly) modified by Iris Vogel > iris@urz.uni-heidelberg.de <
Version:
0.8/15
Warning:
some of the functions are windows specific
See also:
UMString.h

CPP-File for the UMString library. This library provides a new class "UMString". The UMString type can be used to store Unicode and Multibyte strings. All Multibyte string loaded into an UMString are automatically available as Unicode strings and vice versa. his class also provides various function for Unicode and Multibyte string manipulation like easy concatenation, searching, regular expressions, ...


Constructor & Destructor Documentation

UMString::UMString  ) 
 

Default Constructor for the UMString-Class.

Unicode and Multibyte char arrays are initialized with an array size of 1 char/wchar_t and a '' in element [0]

UMString::UMString const UMString tmp_umstring  ) 
 

Constructor for the UMString-Class which takes a UMString object as init-value.

UMString::UMString const char *  tmp_mb_string  ) 
 

UMString Class Constructor for a Multibyte string passed at creation of an UMString Object.

Parameters:
tmp_mb_string -> Initializing Multibyte string

UMString::UMString const wstring  tmp_uc_string  ) 
 

UMString Class Constructor for Unicode string (wstring) passed at creation.

Parameters:
tmp_uc_string -> Initializing String

UMString::UMString const wchar_t *  tmp_uc_string  ) 
 

UMString Class Constructor for Unicode string passed at creation.

Parameters:
tmp_uc_string -> Initializing String

UMString::UMString const string  tmp_stlstring  ) 
 

UMString Class Constructor for STL-String passed at creation.

Parameters:
tmp_stlstring -> Initializing String

UMString::~UMString  ) 
 

Destructor for UMString Class.


Member Function Documentation

void UMString::Clear  ) 
 

Clears all stored string data in the object.

UINT UMString::GetLengthMB  )  const
 

Reads out the number of characters in the stored Multibyte string.

Returns:
(int) number of characters in the Multibyte string

UINT UMString::GetLengthUC  )  const
 

Indexed array access to the stored Unicode-string Reads out the number of characters in the stored Unicode string.

Returns:
(int) number of characters in the Unicode string

char * UMString::GetStringMB  )  const
 

Returns the Multibyte string to the calling function.

Returns:
(char*) the Unicode string stored in the object

wchar_t * UMString::GetStringUC  )  const
 

Returns the Unicode string to the calling function.

Returns:
(wchar_t*) the Unicode string stored in the object

void UMString::MB2UC  )  [private]
 

Synchronizes the MultiByte and the Unicode string.

Synchronizes the MultiByte and the Unicode string in the UMString-object by taking the MultiByte string as the original pattern. This routine is only used internally and is private to this class

Warning:
This function still uses windows-specific functions. These should be replaced, if everything else works fine

UMString::operator char *  )  [inline]
 

casting to char

UMString::operator const char *  )  const [inline]
 

casting to const char

UMString::operator const string  )  const [inline]
 

casting to const string

UMString::operator const wchar_t *  )  const [inline]
 

casting to const wchar_t

UMString::operator string  )  [inline]
 

casting to string

UMString::operator wchar_t *  )  [inline]
 

casting to w_char

const UMString UMString::operator+ const UMString right  ) 
 

Concatenation operator for UMString class.

Parameters:
right -> Reference to the right-hand string of the "+"-operation.
Returns:
(UMString&) The concatenated UMString

const UMString & UMString::operator= const UMString right  ) 
 

Copy-Constructor for the UMString Class.

Parameters:
right -> Reference to the right-hand string of the "="-operation.
Returns:
(UMString&) A reference to the left-hand object for further processing.

bool UMString::operator== const UMString right  ) 
 

Comparison operator for UMString class.

Parameters:
right -> Reference to the right-hand string of the "=="-operation.
Returns:
(bool) true, if the UCStrings are equal, false if the UCStrings are not equal

char * UMString::PushString const char *  tmp_mb_string  ) 
 

Loads a Multibyte string into the UMString object.

Parameters:
tmp_mb_string Multibyte string to store
Returns:
(char*) the string that was loaded is also returned

wchar_t * UMString::PushString const wchar_t *  tmp_uc_string  ) 
 

Loads a Unicode string into the UMString object.

Parameters:
tmp_uc_string Unicode string to store
Returns:
(wchar_t*) the string that was loaded is also returned

bool UMString::ReplaceSubstring const UINT  First,
const UINT  Last,
const UMString  NewString
 

Replaces the substring beginning at character "First" up to character "Last".

Parameters:
First The first character to be replaced
Last End of substring to be replaced
NewString The replacing string
Returns:
true = Operation successfull

false = Operation unsuccessfull

UINT UMString::SI_Backward  ) 
 

Sets the sliding index to previous character.

Returns:
(UINT) : new si-position (0-based)

UINT UMString::SI_Forward  ) 
 

Sets the sliding index to next character.

Returns:
(UINT) : new si-position (0-based)

UINT UMString::SI_Get  )  const
 

Gets the "sliding index"-position (0-based).

Returns:
(UINT) the index position

const wchar_t UMString::SI_GetChar  )  const
 

Returns the Unicode-Character at the sliding index position.

Returns:
(wchar_t) The character at the sliding index position

const wchar_t UMString::SI_GetCharNext  )  const
 

Returns the Unicode-Character after the sliding index position.

Returns:
(wchar_t) The character after the sliding index position

int UMString::SI_GetPosChar wchar_t &  ucchar,
UINT  offset = 0
 

searches for a char in string, starting at an offset position

Parameters:
ucchar wchar_t character to look for
offset UINT offset for search
Returns:
int : position of character in demand,

int -1 if character was not found

Note:
added by Iris

const UMString UMString::SI_GetRemaining  )  const
 

Gets the string remaining after the "sliding index" position.

Returns:
(UMString) the remaining string in a UMString-object

const UMString UMString::SI_GetSizedString const UINT  first,
const UINT  last
const
 

gets a slice of the string

Returns a new string object which is copied out of the original string, beginning at position first (0-based) stretching to last (also 0-based).

Parameters:
first first character to be copied out
first last last character to be copied out
Returns:
(UMString) : The new string object which contains the string indicated by "first" & "last"

const UMString UMString::SI_GetToken  ) 
 

Gets the next token consisting of characters like the one the sliding index points to.

Returns:
(UMString) the token in a UMString-object

const UMString UMString::SI_GetTokenPosAfter  ) 
 

gets the next token

Gets the next tokenconsisting of characters like the one the sliding index points to. After that the sliding index is positioned after the read out token

Returns:
(UMString) the token in a UMString-object

bool UMString::SI_NextDifferent  ) 
 

Sets the "sliding index" to the next character, which is of different CHARTYPE compared to the current.

Returns:
(bool) : true = a different character was found and the the si was set

(bool) : false = a different character was not found and the si was not changed

bool UMString::SI_NextHira  ) 
 

Sets the "sliding index" to the next hiragana-character in the string after the current "sliding index"-position.

Returns:
(bool) : true = a hiragana-character was found and the the si was set

(bool) : false = no hiragana-character was found and the si was not changed

bool UMString::SI_NextKanji  ) 
 

Sets the "sliding index" to the next kanji-character in the string after the current "sliding index"-position.

Returns:
(bool) : true a kanji-character was found and the the si was set

false no kanji-character was found and the si was not changed

bool UMString::SI_NextKata  ) 
 

Sets the "sliding index" to the next katakana-character in the string after the current "sliding index"-position.

Returns:
(bool) : true = a katakana-character was found and the the si was set

false = no katakana-character was found and the si was not changed

bool UMString::SI_PrevDifferent  ) 
 

Sets the "sliding index" to the next left character, which is of different CHARTYPE compared to the current.

Returns:
(bool) : true = a different character was found and the the si was set

(bool) : false = a different character was not found and the si was not changed

bool UMString::SI_PrevHira  ) 
 

Sets the "sliding index" to the previous hiragana-character in the string before the current "sliding index"-position.

Returns:
(bool) : true = a hiragana-character was found and the the si was set

(bool) : false = no hiragana-character was found and the si was not changed

bool UMString::SI_PrevKanji  ) 
 

Sets the "sliding index" to the previous kanji-character in the string before the current "sliding index"-position.

Returns:
(bool) : true = a kanji-character was found and the the si was set

(bool) : true false = no kanji-character was found and the si was not changed

bool UMString::SI_PrevKata  ) 
 

Sets the "sliding index" to the previous katakana-character in the string before the current "sliding index"-position.

Returns:
(bool) : true a katakana-character was found and the the si was set

(bool) : false no katakana-character was found and the si was not changed

bool UMString::SI_Set const UINT  x  ) 
 

Sets the "sliding index" by passing an absolute position.

Parameters:
x -> the absolute position (0-based)

void UMString::SI_SetChar const wchar_t  tmp_uc_char  ) 
 

Replaces the Unicode-Character at the sliding index position.

Parameters:
tmp_uc_char : Replacement character

bool UMString::SI_SetPosChar wchar_t &  ucchar  ) 
 

sets the sliding index to the next occurance of uuchar

Parameters:
wchar_t& : character to look for
Returns:
bool : true if found

bool false : if not found

Note:
added by Iris

bool UMString::SI_SplitUMString UMString str1,
UMString str2,
const UINT  x
 

splits the UMString into two at position x

Parameters:
UMString& str1 : part of UMString from position 0 to x-1
UMString& str2 : part of UMString from position x to end of string
const UINT x : point to split
Returns:
bool : true if found

false : if not found

Note:
added by Iris

bool UMString::SI_TokenEnd  ) 
 

Sets the sliding index pointer to the end of the current token.

Returns:
(bool)

bool UMString::SI_TokenStart  ) 
 

Sets the sliding index pointer to the beginning of the current token.

void UMString::UC2MB  )  [private]
 

Synchronizes the MultiByte and the Unicode string.

Synchronizes the MultiByte and the Unicode string in the UMString-object by taking the Unicode string as the original pattern. This routine is only used internally and is private to this class

Warning:
This function still uses windows-specific functions. These should be replaced, if everything else works fine


Member Data Documentation

char* UMString::mb_string [private]
 

The stored MultiByte-String

UINT UMString::si [private]
 

The "sliding index"-Pointer

wchar_t* UMString::uc_string [private]
 

The stored UNICODE-String


The documentation for this class was generated from the following files:
Generated on Mon Aug 18 19:27:10 2003 for LeJa by doxygen 1.3.3