(See libcp.htm for information about the header files and namespaces used below.)
A container is actually a handle that manages a potentially complicated C-style data structure such as an AVL tree, hash table, or doubly linked list (see the Handle-Body idiom discussed in chapter 4). Fortunately, implementing these data structures is largely an academic exercise these days (but a good one). This is because the standard C++ library now comes with a collection of container templates called the Standard Template Library.
Containers can be roughly divided into two types: sets and sequences. Recall from mathematics that a sequence is an ordered set. Thus, the sequence (a, b, c) is different from the sequence (c, b, a), while the set {a, b, c} is the same as the set {c, b, a}. Recall also that multiple occurrences of an element are disregarded in a set, but not a sequence. So the sequence (a, a, b, c) is different from the sequence (a, b, c), while the set {a, a, b, c} is the same as the set {a, b, c}.
Here is a list of the STL containers:
Sequences
vector<Storable> (random access store)
string (derived from vector<char>)
list<Storable> (sequential access store)
deque<Storable> (base class store for stacks and queues)
pair<Key, Storable> (elements of a map)
Sequence Adaptors (temporal access stores)
stack<Storable> (LIFO store)
queue<Storable> (FIFO store)
priority_queue<Storable> (uses
Storable::operator<())
Sets
set<Storable>
multiset<Key> (multiple occurences allowed)
map<Key, Storable> (a set of pairs)
multimap<Key, Storable> (a multiset of pairs)
Here are some of the header files you need to include if you want to use STL containers:
#include <string>
#include <vector>
#include <list>
#include <deque>
#include <stack>
#include <queue>
#include <map>
using namespace std;
A C string is simply a static or dynamic array of characters:
typedef char String80[80]; // static arrays
typedef char* String; // pointers to
dynamic arrays
A C string literal is a sequence of characters bracketed by double quotes (backslash is the escape character):
String path = "c:\\data\\info.doc";
String80 prompt = "Type \"q\" to quit\n";
C strings are always terminated by the NUL character (i.e., ASCII code 0). The standard C library provides functions for manipulating C strings. These are declared in <cstring>:
int strcmp( const char* string1, vonst char* string2 );
size_t strlen( const char* string );
char* strcpy( char* dest, char* source );
char* strcat( char* dest, char* source );
// etc.
Because C strings don't check for out-of-range indices, and because they don't allocate and deallocate memory for themselves, programmers should use C++ strings instead. C++ strings are instances of the string class declared in <string>, which is part of the std namespace:
#include <string>
using namespace std;
We can use C strings to initialize C++ strings:
string s1("
s2 = "
The third C++ string, s3, is currently empty:
cout << boolalpha << s3.empty() << endl; // prints "true"
C++ strings can be read from an input stream using the extraction operator and written to an output stream using the insertion operator.
cout << "Enter 3 strings separated by white space:
";
cin >> s1 >> s2 >> s3;
cout << "You entered: " << s1 << ' ' << s2
<< ' ' << s3 << endl;
The size() and length() member functions both return the number of characters in a C++ string:
cout << s1.length() + s2.size() << endl; // prints 16
There are several ways to access the characters in a C++ string. One way is to use the subscript operator:
for(int i = 0; i < s1.size(); i++) cout << s1[i] << ' ';
Unfortunately, the subscript operator doesn't check for index range errors. For this we must use the at() member function, which throws an out_of_range exception when the index is too large:
try
{
cout << s1.at(500) << endl; // oops, index too big!
}
catch(out_of_range e)
{
cerr << e.what() << endl;
// prints "invalid string position"
}
Assigning s2 to s3:
s3 = s2;
automatically copies s2 to s3. We can verify this by modifying the first character of s3:
s3[0] = 'n'; // change 'N' to 'n'
then printing s2 and noticing that it is unchanged:
cout << s2 << endl; // prints "
One of the best features of C++ strings is that they automatically grow to accommodate extra characters:
s3 = "
We don't need to bother with the awkward C library string manipulation functions. C++ strings can be compared and concatenated using infix operators:
s1 == s2; // = false
s1 != s2; // = true
s1 <= s2; // = true
s3 = s1 + ' ' + "is next to" + ' ' + s2;
// etc.
There are also many member functions that can be used to find, extract, erase, and replace substrings:
int pos = s1.find("for");
// pos = 4
s3 = s1.substr(pos, 3); // s3 =
"for"
s3 = s1.substr(pos, string::npos);
// s3 = "fornia"
s1.replace(pos, 3, "XX");
// s1 = "CaliXXnia"
s1.erase(pos, 2); // s1 =
"Calinia"
The c_str() member function returns an equivalent C string which can be used by C library functions:
printf("%s\n", s1.c_str());
There are many more C++ string member functions. The reader should consult online documentation or [STR] for a complete list.
Iterators are similar to the smart pointers discussed in Chapter 4, only an iterator is an object that "points" to an element in a container, string, or stream. Like pointers, iterators can be incremented, decremented, compared, or dereferenced.
For example, we can traverse an array using index numbers:
char msg[] = "Hello World";
for (int i = 0; i < strlen(msg); i++)
cout << msg[i] << ' ';
but we could just as easily use a pointer to traverse the array:
for(char* p = msg; p != msg + strlen(msg); p++)
cout << *p << ' ';
We can traverse a C++ string using index numbers, too:
string msg = "Hello World";
for (int i = 0; i < msg.length(); i++)
cout << msg[i] << ' ';
But we can't traverse a C++ string using an ordinary pointer, because a string is only a handle that encapsulates a C string. Instead, we must use an iterator to "point" at elements inside of a string.
for(string::iterator p = msg.begin(); p != msg.end(); p++)
cout << *p << ' ';
Iterator declarations must be qualified by the appropriate container class:
string::iterator p;
This is because the iterator class for a container class is declared inside the container class declaration. For example:
class string
{
public:
class iterator { ... };
class const_iterator { ... };
iterator begin();
iterator end();
// etc.
};
This is done because iterators for strings might be quite different for iterators for lists, maps, and deques.
All container classes provide begin() and end() functions. The begin() function returns an iterator that points to the first element in the container. The end() function returns an iterator that points to a location just beyond the last element of the container.
C++ iterators can be regarded as an instantiation of the more general iterator design pattern:
Iterator [Go4]
Other Names
Cursor
Problem
Sharing a data structure such as a linked list or file is difficult, because apparently non destructive operations, such as printing the list, can actually be destructive if they move the list's internal cursor (i.e., the list's pointer to its current element).
Solution
An iterator is an external cursor owned by a client. Modifying an iterator has no effect on iterators owned by other clients.
Assume c is an STL container, x is a potential container element, and p is an iterator. Here are some of the most common operations:
c.begin() = iterator
"pointing" to first element of c
c.end() = iterator
"pointing" to one past last element
c.front() = first element of c
c.back() = last element of c
c.size() = number of elements in
c
c.empty() = true if container is
empty
c.push_back(x) inserts x
at end of c
c.push_front(x) inserts x at beginning
of c
c.insert(p, x) inserts x in c before *p
c.pop_front() removes
first element of c
c.pop_back() removes last element of
c
c.erase(p) removes *p
STL vectors encapsulate dynamic arrays (see chapter 4). They automatically grow to accommodate any number of elements. The elements in a vector can be accessed and modified using iterators or indices.
To begin, let's provide a template for printing vectors, this will make a nice addition to our utility library. We use a constant iterator:
template <typename Storable>
ostream& operator<<(ostream& os, const
vector<Storable>& v)
{
os << "( ";
vector<Storable>::const_iterator
p;
for( p = v.begin(); p != v.end(); p++)
os << *p << ' ';
os << ')';
return os;
}
Here's how we declare an integer vector:
vector<int> vec;
It's easy to add elements to the back of a vector:
vec.push_back(7);
vec.push_back(42);
vec.push_back(19);
vec.push_back(100);
vec.push_back(-13);
cout << vec << endl; // prints (7 42 19 100 –13)
It's also easy to remove elements from the back of a vector:
vec.pop_back(); // removes -13
Of course we can use the subscript operator to access and modify vector elements:
for(int i = 0; i < vec.size(); i++)
vec[i] = 2 * vec[i]; // double each
element
The subscript operator is unsafe. It doesn't check for out of range indices. As with strings, we must us the vec.at(i) member function for safe access.
To insert or remove an item from any position in a vector other than the last, we must first obtain an iterator that will point to the insertion or removal position:
vector<int>::iterator p = vec.begin();
Next, we position p:
p = p + 2;
Finally, we insert the item using the insert() member function:
vec.insert(p, 777);
cout << vec << endl; // prints (14 84 777 38 200)
We follow the same procedure to remove an element:
p = vec.begin() + 1;
vec.erase(p);
cout << vec << endl; // prints (14 777 38 200)
STL lists encapsulate and manage linked lists. As before, we begin with a useful list writer. Only two changes are needed from the vector writer:
template <typename Storable>
ostream& operator<<(ostream& os, const list<Storable>&
v)
{
os << "( ";
list<Storable>::const_iterator p;
for( p = v.begin(); p != v.end(); p++)
os << *p << ' ';
os << ")";
return os;
}
Here is how we declare two lists of strings:
list<string> vals1, vals2;
We can add items to the front or rear of a list:
vals1.push_back("golf");
vals1.push_back("judo");
vals1.push_back("pool");
vals1.push_back("boxing");
cout << vals1 << endl; // prints (golf judo pool boxing)
vals2.push_front("car");
vals2.push_front("boat");
vals2.push_front("plane");
vals2.push_front("horse");
cout << vals2 << endl; // prints (horse plane boat car)
As with vectors, to insert an item into a list we first need an iterator pointing to the point of insertion:
list<string>::iterator p = vals1.begin();
p++; p++; // p + 2 not allowed!
vals1.insert(p, "yoga");
cout << vals1 << endl; // prints (golf judo yoga pool boxing)
Inserting an item into a list is more efficient than inserting an item into a vector. This is because logically consecutive elements of a dynamic array are also physically consecutive, while linked lists decouple logical and physical order. Therefore, inserting an item into a vector involves moving all elements above the insertion point to make space, while inserting an item into a list merely involves adjusting a few links.
On the other hand, pointing a vector iterator at an insertion point can be done by a single operation:
p = vals.begin() + n;
While pointing a list iterator at an insertion point involves multiple operations:
for(int i = 0, p = vals1.begin(); i < n; i++) p++;
In general, C++ iterators belong to categories. While not classes, iterator categories are organized into a specialization hierarchy that can be described by the following class diagram:
List iterators are bidirectional. They inherit the ability to be incremented (by 1), compared, and dereferenced. In addition, they can be decremented (by 1). Vector iterators are random access. They inherit all of the functionality of bidirectional iterators, but they can be incremented or decremented by arbitrary amounts.
Removing an item from a linked list can also be done by pointing an iterator at the item to be removed:
p = vals2.begin();
p++;
vals2.erase(p);
cout << vals2 << endl; // prints (horse boat car)
But we can also remove the first occurrence of an item in the list by simply specifying the item to be removed:
vals1.remove("golf");
The list<> template provides member functions for various useful list operations:
vals2.reverse();
vals1.sort();
vals1.merge(vals2);
cout << vals1 << endl;
// prints (boxing car boat horse judo pool yoga)
Consult your online documentation or [STR] for a complete list.
By themselves, deques (rhymes with "wrecks") are relatively useless. They are optimized for inserting and removing elements at either end, hence are ideal adaptees for stacks and queues. See the discussion on adapters in chapter 4 for details.
One of the most common and useful data structures is a table. For example, we used save and load tables in our persistence framework, and we used a prototype table in our implementation of the Prototype pattern. (See chapter 5 for details.)
A table is simply a list of pairs (i.e., the rows of the table) of the form:
(KEY, VALUE)
KEY and VALUE can be any type, but no two pairs in a table have the same key. This requirement allows us to associatively search tables for a pair with a given key.
For example, a concordance is a table that alphabetically lists every word that appears in a document, together with the number of occurrences of that word. For example, The Harvard Concordance to Shakespeare lists every word ever written by the Bard. It is 2.5 inches thick and lists about 29,000 different words! (The King James Bible only uses 6000 different words.) About 1/12 of the words only occur once, and, contrary to popular belief, the word "gadzooks" never occurs!
Tables can be represented in C++ programs by STL maps. A map is a sequence of pairs, where pair is an STL type that simply binds two elements together:
template <typename KEY, class VALUE>
struct pair
{
KEY first;
VALUE second;
};
For example, here is a concordance for a document containing the single phrase, "to be or not to be":
Here is one way we could build this concordance in C++:
map<string, int> concordance;
concordance["to"] = 2;
concordance["be"] = 2;
concordance["or"] = 1;
concordance["not"] = 1;
Notice that we can use the array subscript operator to insert entries. In a way, a map is like an array indexed by an arbitrary type. In fact, if the index isn't a key in the table, a new pair is created with this key associated to the default or null value of the corresponding value type, and the new pair is inserted into the table. Since the "null" value for the integer type is 0, we can use this fact to create our concordance in a different way:
map<string, int> concordance;
concordance["to"]++;
concordance["be"]++;
concordance["or"]++;
concordance["not"]++;
concordance["to"]++;
concordance["be"]++;
Here are useful template functions for writing pairs and maps:
template <typename Key, typename Value>
ostream& operator<<(ostream& os, const pair<Key, Value>&
p)
{
os << '(' << p.first
<< ", " << p.second << ')';
return os;
}
template <typename Key, typename Value>
ostream& operator<<(ostream& os, const map<Key, Value>&
m)
{
map<Key, Value>::const_iterator
p;
for( p = m.begin(); p != m.end(); p++)
os << *p << endl;
return os;
}
For example,
cout << concordance;
prints:
(be, 2)
(not, 1)
(or, 1)
(to, 2)
Putting this together, here's how we can implement a concordance generator that reads a document from standard input and writes the concordance to standard output (we can use file redirection to read and write to files):
int main()
{
string s;
map<string, int> concordance;
while (cin >> s)
concordance[s]++;
cout << concordance;
return 0;
}
Unfortunately, searching a table is a little awkward using the map<> member functions. If m is a map and k is a key, then m.find(k) returns an iterator pointing to the first pair in m having key == k, otherwise m.end() is returned. We can make this job easier with the following template function that can be used to find the value associated with a particular key. The function returns false if the search fails:
template <typename Key, typename Value>
bool find(Key k, Value& v, const map<Key, Value>& m)
{
map<Key, Value>::iterator p;
p = m.find(k);
if (p == m.end())
return false;
else
{
v = (*p).second;
return true;
}
}
Here's how we could learn the word count for a string in our concordance:
int count = 0;
string word = "be";
if (find(word, count, concordance))
cout << word << "
count = " << count << endl;
else
cout << word << " not
found\n";