#include "dawglexicon.h"

class DawgLexicon

This class is used to represent a lexicon, or word list. The main difference between a lexicon and a dictionary is that a lexicon does not provide any mechanism for storing definitions; the lexicon contains only words, with no associated information. It is therefore similar to a set of strings, but with a more space-efficient internal representation. The DawgLexicon class supports efficient lookup operations for words and prefixes.

As an example of the use of the DawgLexicon class, the following program lists all the two-letter words in the lexicon stored in EnglishWords.dat:

   int main() {
      DawgLexicon english("EnglishWords.dat");
      for (string word : english) {
         if (word.length() == 2) {
            cout << word << endl;
         }
      }
      return 0;
   }

NOTE: Unlike the standard Lexicon implementation, DawgLexicon is implemented internally using a directed acyclic word graph ("DAWG"), which allows it to have an efficient binary file representation. The performance of this kind of lexicon is severely degraded if words are added to it individually. If an entire binary DAWG file of words is added using the addWordsFromFile method, the words are represented efficiently as a directed acyclic word graph (DAWG). But if words are added one at a time, or if the words come from a plain text input file, these are placed into a secondary internal backing Set which has much lower runtime performance, particularly for prefix-related operations like containsPrefix. Also, because of its representation, this form of lexicon does not support certain operations such as remove. If you want to be able to add words one-at-a-time or read from plain-text input files, or if you want to be able to remove words from your lexicon, consider using the Lexicon class rather than DawgLexicon.

Available since: 2014/02/01 version of C++ library

Constructor
DawgLexicon() O(1) Initializes a new empty lexicon.
DawgLexicon(filename) 
DawgLexicon(istream)
O(N log N) Initializes a new lexicon that reads words from the given file or stream.
Methods
add(word)  O(log N) Adds the specified word to the lexicon.
addWordsFromFile(filename) 
addWordsFromFile(istream)
O(N log N) Reads the given file/stream and adds all of its words to the lexicon.
clear()  O(N) Removes all words from the lexicon.
contains(word)  O(log N) Returns true if word is contained in the lexicon.
equals(lex)  O(N log N) Returns true if the two lexicons contain the same words.
containsPrefix(prefix)  O(log N) Returns true if any words in the lexicon begin with prefix.
isEmpty()  O(1) Returns true if the lexicon contains no words.
mapAll(fn)  O(N) Calls the specified function on each word in the lexicon.
size()  O(1) Returns the number of words contained in the lexicon.
toString()  O(N) Converts the lexicon to a printable string representation.
Operators
lex1 == lex1  O(N log N) Returns true if lex1 and lex2 contain the same elements.
lex1 != lex2  O(N log N) Returns true if lex1 and lex2 are different.
ostream << lex O(N) Outputs the contents of the lexicon to the given output stream.
istream >> lex O(N log N) Reads the contents of the given input stream into the lexicon.

Constructor detail


DawgLexicon();
DawgLexicon(istream& input);
DawgLexicon(string filename);
Initializes a new lexicon. The default constructor creates an empty lexicon. The second form reads in the contents of the lexicon from the specified data file or stream. The data file must be in one of two formats: (1) a space-efficient precompiled binary format or (2) a text file containing one word per line. The Stanford library distribution includes a binary lexicon file named EnglishWords.dat containing a list of words in English. The standard code pattern to initialize that lexicon looks like this:
   DawgLexicon english("EnglishWords.dat");

Usage:

DawgLexicon lex;
DawgLexicon lex(filename);

Method detail


void add(string word);
Adds the specified word to the lexicon.

Usage:

lex.add(word);

void addWordsFromFile(istream& input);
void addWordsFromFile(string filename);
Reads the given file/stream and adds all of its words to the lexicon.

Usage:

lex.addWordsFromFile(filename);

void clear();
Removes all words from the lexicon.

Usage:

lex.clear();

bool contains(string word) const;
Returns true if word is contained in the lexicon. In the DawgLexicon class, the case of letters is ignored, so "Zoo" is the same as "ZOO" or "zoo".

Usage:

if (lex.contains(word)) ...

bool containsPrefix(string prefix) const;
Returns true if any words in the lexicon begin with prefix. Like containsWord, this method ignores the case of letters so that "MO" is a prefix of "monkey" or "Monday".

Usage:

if (lex.containsPrefix(prefix)) ...

bool equals(const DawgLexicon& lex) const;
Returns true if the two lexicons contain exactly the same set of words. Identical in behavior to the == operator.

Usage:

if (lex.equals(lex2)) ...

bool isEmpty() const;
Returns true if the lexicon contains no words.

Usage:

if (lex.isEmpty()) ...

void mapAll(void (*fn)(string)) const;
void mapAll(void (*fn)(const string &)) const;
void mapAll(FunctorType fn) const;
Calls the specified function on each word in the lexicon.

Usage:

lexicon.mapAll(fn);

int size() const;
Returns the number of words contained in the lexicon.

Usage:

int n = lex.size();

string toString() const;
Converts the map to a printable string representation, such as "{word1, word2, word3}".

Usage:

string str = lexicon.toString();