Copyright ©1995 by NeXT Computer, Inc. All Rights Reserved.
IXLexemeExtraction |
Adopted By: | no NEXTSTEP classes | |
Declared In: | indexing/IXAttributeReader.h |
Protocol Description |
IXLexemeExtraction defines methods implemented by readers, which are objects that lexically analyze a stream of text for consumption by a parser, such as an IXAttributeParser. IXAttributeReader subclasses that conform to this protocol are called custom readers, as they implement this protocol to customize certain aspects of lexical analysis. |
Method Types |
Lexing a stream | getLexeme:inLength:fromStream: | |
Manipulating a word/lexeme | foldCase:inLength: |
Instance Methods |
foldCase:inLength: |
(unsigned int)foldCase:(char *)aString inLength:(unsigned int)aLength |
Changes all characters in aString to be lowercase, according to the rules of the language being read. aLength is the length of the string buffer in which aString resides, not the length of the string, which is null-terminated. Returns the length of the changed string. |
getLexeme:inLength:fromStream: |
(unsigned int)getLexeme:(char *)aString |
inLength:(unsigned int)aLength fromStream:(NXStream *)stream |
Extracts a lexeme from stream, putting it into aString. aLength is the length of the string buffer into which the receiver may place the lexeme. This method should return the actual length of the string put into the buffer.
This method may be implemented by subclasses of IXAttributeReader that need more control over lexeme recognition than IXAttributeReader's simple delimiter map strategy can provide. This includes readers that need to recognize phrases or idioms (like "joie de vivre") and readers that handle text in non-phonetic alphabets or in streams that contain special escape sequences. For example, the IXJapaneseReader class developed by Canon uses this method to override the default lexeme recognition, in order to detect embedded escape sequences that denote shifts among the three different Kanji character encodings. |