class DawgLexicon
DawgLexicon
class supports efficient lookup operations for words and prefixes.
As an example of the use of the DawgLexicon
class, the
following program lists all the two-letter words in the lexicon
stored in EnglishWords.dat
:
int main() { DawgLexicon english("EnglishWords.dat"); for (string word : english) { if (word.length() == 2) { cout << word << endl; } } return 0; }
NOTE: Unlike the standard Lexicon
implementation, DawgLexicon
is implemented internally using a directed acyclic word graph ("DAWG"), which allows it to have an efficient binary file representation.
The performance of this kind of lexicon is severely degraded if words are added to it individually.
If an entire binary DAWG file of words is added using the addWordsFromFile
method, the words are represented efficiently as a directed acyclic word graph (DAWG).
But if words are added one at a time, or if the words come from a plain text input file, these are placed into a secondary internal backing Set
which has much lower runtime performance, particularly for prefix-related operations like containsPrefix
.
Also, because of its representation, this form of lexicon does not support certain operations such as remove
.
If you want to be able to add words one-at-a-time or read from plain-text input files, or if you want to be able to remove words from your lexicon, consider using the Lexicon
class rather than DawgLexicon
.
Available since: 2014/02/01 version of C++ library
Constructor | ||
O(1) | Initializes a new empty lexicon. | |
DawgLexicon(istream) |
O(N log N) | Initializes a new lexicon that reads words from the given file or stream. |
Methods | ||
O(log N) | Adds the specified word to the lexicon. | |
addWordsFromFile(istream) |
O(N log N) | Reads the given file/stream and adds all of its words to the lexicon. |
O(N) | Removes all words from the lexicon. | |
O(log N) | Returns true if word is contained in the lexicon. |
|
O(N log N) | Returns true if the two lexicons contain the same words. |
|
O(log N) | Returns true if any words in the lexicon begin with prefix . |
|
O(1) | Returns true if the lexicon contains no words. |
|
O(N) | Calls the specified function on each word in the lexicon. | |
O(1) | Returns the number of words contained in the lexicon. | |
O(N) | Converts the lexicon to a printable string representation. | |
Operators | ||
O(N log N) | Returns true if lex1 and lex2 contain the same elements. |
|
O(N log N) | Returns true if lex1 and lex2 are different. |
|
O(N) | Outputs the contents of the lexicon to the given output stream. | |
O(N log N) | Reads the contents of the given input stream into the lexicon. |
DawgLexicon(); DawgLexicon(istream& input); DawgLexicon(string filename);
EnglishWords.dat
containing a list of words in English. The standard code pattern
to initialize that lexicon looks like this:
DawgLexicon english("EnglishWords.dat");
Usage:
DawgLexicon lex; DawgLexicon lex(filename);
void add(string word);
Usage:
lex.add(word);
void addWordsFromFile(istream& input); void addWordsFromFile(string filename);
Usage:
lex.addWordsFromFile(filename);
void clear();
Usage:
lex.clear();
bool contains(string word) const;
true
if word
is contained in the
lexicon. In the DawgLexicon
class, the case of letters is
ignored, so "Zoo" is the same as "ZOO" or "zoo".
Usage:
if (lex.contains(word)) ...
bool containsPrefix(string prefix) const;
prefix
.
Like containsWord
, this method ignores the case of letters
so that "MO" is a prefix of "monkey" or "Monday".
Usage:
if (lex.containsPrefix(prefix)) ...
bool equals(const DawgLexicon& lex) const;
true
if the two lexicons contain exactly the same set of words.
Identical in behavior to the ==
operator.
Usage:
if (lex.equals(lex2)) ...
bool isEmpty() const;
true
if the lexicon contains no words.
Usage:
if (lex.isEmpty()) ...
void mapAll(void (*fn)(string)) const; void mapAll(void (*fn)(const string &)) const; void mapAll(FunctorType fn) const;
Usage:
lexicon.mapAll(fn);
int size() const;
Usage:
int n = lex.size();
string toString() const;
"{word1, word2, word3}"
.
Usage:
string str = lexicon.toString();