Copyright ©1995 by NeXT Computer, Inc. All Rights Reserved.
IXWeightingDomain |
Inherits From: | Object | |
Declared In: | indexing/IXWeightingDomain.h |
Class Description |
An IXWeightingDomain represents word count, rank, and frequency information for a body of text. It can be used to convert word counts between several different formats, and to discover information about specific words, or tokens, in the body of text. An IXWeightingDomain doesn't store the body of text whose statistics it represents, and doesn't maintain any sort of record of what the body of text is. It is simply a summary of the word frequency information, to be used as needed.
IXAttributeParser uses IXWeightingDomain to compute word peculiarities when parsing text. The peculiarity of a word in a text sample is its frequency in the sample divided by its frequency in the IXWeightingDomain (in this case called the reference domain), normalized by taking the square root. The result is a measure of the frequency of the word in the sample relative to the reference domain. Words that are common in the reference domain receive lesser significance than they would have had, and words that are rare in the reference domain receive greater significance. The effect is to bias the weights with a filter that reduces domain-specific "noise words." |
Instance Variables |
unsigned int beenRanked;
unsigned int totalTokens; unsigned int uniqueTokens; unsigned int indexCount; unsigned int totalLength; void *tokenArray; unsigned int *tokenIndex; |
beenRanked | YES if tokens have been ranked. | |
totalTokens | The number of tokens in the sample. | |
uniqueTokens | The number of unique tokens in the sample. | |
indexCount | The number of entries in the token index. | |
totalLength | The total of all the token lengths. | |
tokenArray | Array of tokens with rank and count. | |
tokenIndex | Array of offsets into tokenArray. |
Method Types |
Initializing instances | initFromDomain: |
initFromHistogram: initFromWFTable: |
Saving domain information | writeDomain: |
writeHistogram: writeWFTable: |
Counting tokens | totalTokens |
uniqueTokens |
Retrieving information about tokens |
countForToken:ofLength: |
rankForToken:ofLength: frequencyOfToken:ofLength: peculiarityOfToken:ofLength:andFrequency: |
Instance Methods |
countForToken:ofLength: |
(unsigned int)countForToken:(void *)aToken ofLength:(unsigned int)aLength |
Returns the number of times aToken occurs in the body of text represented by the IXWeightingDomain. aLength must be the length, in bytes, of aToken.
See also: rankForToken:ofLength:, frequencyOfToken:ofLength:, peculiarityOfToken:ofLength:andFrequency: |
frequencyOfToken:ofLength: |
(float)frequencyOfToken:(void *)aToken ofLength:(unsigned int)aLength |
Returns the frequency of occurrence for aToken in the body of text represented by the IXWeightingDomain. aLength must be the length, in bytes, of aToken. The frequency is equal to the number of times aToken occurs divided by the total number of tokens in the IXWeightingDomain.
See also: peculiarityOfToken:ofLength:andFrequency:, countForToken:ofLength:, rankForToken:ofLength: |
initFromDomain: |
initFromDomain:(NXStream *)stream |
Initializes a newly allocated IXWeightingDomain from stream, which should contain data in domain format as created by the writeDomain: method.
See also: initFromHistogram:, initFromWFTable:, writeDomain: |
initFromHistogram: |
initFromHistogram:(NXStream *)stream |
Initializes the IXWeightingDomain from stream, which should contain data in histogram format as created by the writeHistogram: method.
See also: initFromDomain:, initFromWFTable:, writeHistogram: |
initFromWFTable: |
initFromWFTable:(NXStream *)stream |
Initializes the IXWeightingDomain from stream, which should contain data in the NEXTSTEP Release 2 WFTable format.
See also: initFromDomain:, initFromHistogram:, writeWFTable: |
peculiarityOfToken:ofLength:andFrequency: |
(float)peculiarityOfToken:(void *)aToken |
ofLength:(unsigned int)aLength andFrequency:(float)aFrequency |
Returns the peculiarity of aToken occurring in some domain with frequency aFrequency, relative to the body of text represented by the reference domain. aLength must be the length, in bytes, of aToken. The peculiarity is equal to the square root of aFrequency divided by the frequency of the token within the reference domain.
See also: frequencyOfToken:ofLength:, countForToken:ofLength:, rankForToken:ofLength: |
rankForToken:ofLength: |
(unsigned int)rankForToken:(void *)aToken ofLength:(unsigned int)aLength |
Returns the rank of aToken in the IXWeightingDomain; the rank is the token's position in an ordering of the set of unique tokens by count. aLength must be the length, in bytes, of aToken. The token with the highest count has a rank of 1; the token with the lowest count has a rank equal to the number of unique tokens.
See also: countForToken:ofLength:, frequencyOfToken:ofLength:, peculiarityOfToken:ofLength:andFrequency: |
totalTokens |
(unsigned int)totalTokens |
Returns the total number of tokens in the IXWeightingDomain; that is, the sum of the number of occurrences each token, over the set of unique tokens.
See also: uniqueTokens |
uniqueTokens |
(unsigned int)uniqueTokens |
Returns the number of unique tokens in the IXWeightingDomain.
See also: totalTokens |
writeDomain: |
writeDomain:(NXStream *)stream |
Writes the IXWeightingDomain to stream in domain format.
See also: writeHistogram:, writeWFTable:, initFromDomain: |
writeHistogram: |
writeHistogram:(NXStream *)stream |
Writes the IXWeightingDomain to stream in histogram format.
See also: writeDomain:, writeWFTable:, initFromHistogram: |
writeWFTable: |
writeWFTable:(NXStream *)stream |
Writes the IXWeightingDomain to stream in NEXTSTEP Release 2 WFTable format.
See also: writeDomain:, writeHistogram:, initFromWFTable: |