StringMap

The smallest possible attribute system needed a fastest possible lookup. Functionally, a std::map<std::string, int> or std::map<const char*, int, Pred> might have done the job. The performance wasn’t there however. I did a bit of research and discovered ternary search trees by Bentley and Sedgewick; wrapped them up with a template to turn what was effectively a set into a map, timed it, and provide the result here. Note that real production code would test for error conditions, and deal with them in a fashion appropriate to your application.

This code is intended to get you thinking about what kind of container might be appropriate to the job you have at hand, and also to encourage you to not limit yourself to what the library vendor provided you or what dogma might dictate.

Timing

For a timing test, I read in a dictionary, added it to a std::map and a StringMap in alphabetical order, shuffled the dictionary and added all the words out of order, then shuffled the dictionary again, and searched for all the words in the map and the StringMap. In general, the results below can be multiplied by two to know how fast StringMap runs in debug, and multiplied by six to ten to know how fast std::map runs in debug. Timings are reported in CPU cycles, and measured under MSVC 2008 with the out of the box STL, optimized build, multi-threaded release libraries, RTTI and exceptions turned on.

Update. I have averaged many more samples. I have retested with all combinations of RTTI and exceptions off and on. Contrary to conventional wisdom, I was not able to measure a statistically significant difference between the performance of all four variations.

Test std::map StringMap
insert, in order 540M 55M
insert, shuffled 540M 85M
find all 260M 85M

Observations

A further benefit of StringMap is that very few heap allocations are involved. The insert time of std::map could be improved with a custom predicator, and a custom allocator, however the find time can’t be improved greatly. One of the big differences between a map of strings and a StringMap is that during find, the std::map needs to do a strcmp for every node of the tree it visits, whereas the StringMap advances a character at a time through the search string as it advances through the nodes. This alone makes a huge difference.

The last merit I want to point out is the sheer tiny elegance of Bentley and Sedgewick’s code – insert is a mind boggling 50 lines, and find is a mere 22 lines long. (Ignore my clumsy InOrder function, it’s not the elegant part!)

If you want to use this code as a set, remove the Data from the Tnode, and all references to it – this will bring the code back in line with the original.

For more information and formal performance analysis on ternary search trees, I refer you to Bentley & Sedgewick’s informative article at Dr. Dobb’s, and their scholarly research.

Exception Safety

The code as presented here is exception safe. I have replaced the malloc/free in the original posting with new[]/delete[], and added a throw of bad::alloc in the case of filling up all the pools. It is permissible for assignment of data to throw an exception, it is up to the user of the StringMap to catch and respond accordingly – if data cannot be assigned, it is not up to StringMap to do something about it.

 

#ifndef STRINGMAP_H
#define STRINGMAP_H

#include <stdlib.h> // for malloc/free
#include <vector>    // for the InOrder function.

/*
This code is described in "Ternary Search Trees" by Jon
Bentley and Robert Sedgewick in the April, 1998, Dr. Dobb's Journal.

http://www.ddj.com/windows/184410528

Other algorithms and analysis are available here:

http://www.cs.princeton.edu/~rs/strings/

Ternary search trees offer substantial advantages over both
binary search trees and digital search tries. We feel that they are superior
to hashing in many applications for the following reasons:

Efficient and easy to implement
No extra overhead for insertion or successful searches
Usually substantially faster than hashing for unsuccessful searches
Gracefully grow and shrink; hash tables need to be rebuilt after large size changes
Support advanced searches, such as partial-match and near-neighbor search.
Traversal to report items in sorted order. 

-------------------------------------------------

I've added the StringMap template wrapper to encapsulate the glboal variables
in Bentley & Sedgewick's original code. Otherwise, this implementation is
substantially identical.

StringMap needs a simple and efficient iterator in order to get rid
of the InOrder method

I haven't bothered with adding an erase function, or STL compatibility,
because my goal is an as-light-as-possible symbol table look up. The extra
complexity doesn't help that goal.

*/

#ifdef HAVE_EXCEPTIONS
#  define STRINGMAP_ERROR throw std::bad_alloc();
#else
#  include <assert.h>
#  define STRINGMAP_ERROR assert(false);
#endif

template <class Data>
class StringMap
{
public:
    static const int buffPoolGrowSize = 1000;
    static const int maxPools = 1000;

    struct Tnode {
        char    splitchar;
        Data    data;
        Tnode*    lokid;
        Tnode*    eqkid;    // contains middle kid if splitchar != 0, else char* pointer to full string (node is a leaf)
        Tnode*    hikid;
    };

    StringMap() : buf(0), bufn(0), freen(0), size(0), root(0)
    {
        memset(freearr, 0, sizeof(freearr));
    }

    ~StringMap()
    {
        for (int i = 0; i < freen; i++)
            delete[](freearr[i]);
    }

    // A major cost of insert functions is calling malloc to create each node.
    // This function uses the ancient C technique of allocating a buffer of nodes and
    // dealing them out as needed. Profiling shows that this eliminates most of the
    // time spent in storage allocation. 

    void insert(char const* s, const Data& data)
    {
        char const* instr = s;
        Tnode* pp;
        Tnode** p;
        p = &root;
        while (pp = *p)
        {
            // optimization: save a difference in a comparison
            int d;
            if ((d = *s - pp->splitchar) == 0)
            {
                if (*s++ == 0)
                    return;
                p = &(pp->eqkid);
            }
            else if (d < 0)
                p = &(pp->lokid);
            else
                p = &(pp->hikid);
        }
        for (;;)
        {
            // *p = (Tptr) malloc(sizeof(Tnode));    reference code; this is what the pools replace
            if (bufn-- == 0)
            {
                if (freen == maxPools - 1)
                {
                    STRINGMAP_ERROR
                }
                buf = new Tnode[buffPoolGrowSize];
                freearr[freen++] = buf;
                bufn = buffPoolGrowSize - 1;
            }
            *p = buf++;
            pp = *p;
            pp->splitchar = *s;
            pp->lokid = pp->eqkid = pp->hikid = 0;
            if (*s++ == 0)
            {
                // store a pointer to every string in the tree; this data will be used by later search
                // exploit the fact that a node with a null splitchar cannot have an eqkid
                pp->eqkid = (Tnode*) instr;
                pp->data = data;
                ++size;    // count it
                return;
            }
            p = &(pp->eqkid);
        }
    }

    // If the search character is less, go to lokid;
    // if it is greater, go to hikid;
    // if it is equal, go to the next character and eqkid. 

    bool find(char const* s, Data* data) const
    {
        Tnode* p;
        p = root;
        while (p)
        {
            if (*s < p->splitchar)
                p = p->lokid;
            else if (*s == p->splitchar)
            {
                if (*s++ == 0)
                {
                    *data = p->data;
                    return true;
                }

                p = p->eqkid;
            }
            else
                p = p->hikid;
        }
        return false;
    }

    void inOrderR(std::vector<std::pair<char const*, Data*> >& nodes, Tnode* p) const
    {
        if (!p)
            return;

        inOrderR(nodes, p->lokid);

        if (p->splitchar)
            inOrderR(nodes, p->eqkid);
        else
            nodes.push_back(std::pair<char const*, Data*>((char const*)p->eqkid, &p->data));

        inOrderR(nodes, p->hikid);
    }

    void inOrder(std::vector<std::pair<char const*, Data*> >& nodes) const
    {
        inOrderR(nodes, root);
    }

    // print all members of the tree in order

    void traverse(Tnode* p)
    {
        if (!p)
            return;
        traverse(p->lokid);
        if (p->splitchar)
            traverse(p->eqkid);
        else
            printf("%s\n", (char *) p->eqkid);
        traverse(p->hikid);
    }

    Tnode* buf;                    // next available Tnode
    int bufn;                    // number of Tnodes available in current pool
    int freen;                    // number of pools to be freed
    Tnode* freearr[maxPools];    // all the allocated pools
    int size;                    // number of nodes

    Tnode* root;                // root of the tree
};

#endif

Post to Twitter Post to Delicious Post to Facebook

  • admin
    In the era Bentley and Sedgewick wrote this code, it was traditional for the programmer to write a recursion to iterate the tree; see the example print-in-order function in the supplied code. The notion of iterators came a bit later, and is perhaps not ideally mapped to this structure. In my own usage of this code, I use it strictly for fast lookup, and write a recursion when I need to iterate over all the items.

    Perhaps a reader has a suggestion on how to write a better iterator than the one I knocked together for this tutorial, or better, has a way to update the StringMap for efficient interoperability with the STL?
  • gogo
    How could be the code to write a StringMap iterator, pls?
    Currently you're just copying the elements to a std::vector which consumes a lot of memory. An iterator could be better.

    thx.
  • admin
  • admin
    Ok, I posted new exception safe code, thanks for all the feedback!

    I have retimed the code as well, just to be sure.
  • admin
    (admin is Nick)

    The fix to exception safety is actually pretty simple now that I think about it. We can replace malloc/free with new[] and delete[], since they will throw if the allocation fails. Whether or not the assignment of Data can throw doesn't matter; one would expect the user to need to deal with a failure of assignment in the map, it's not the place of the map to deal with it.

    I've written a new post to help explain some of the characteristics of game code that non-game coders find hard to fathom. I don't think it's really fair to say our code is horrible or hacky!

    http://meshula.net/wordpress/?p=185

    Beauty is in the eye of the beholder, and what seem to some practitioners to be the most hideous code possible, to someone else can be an elegant rose. I must point out that a wood carver would never allow his knives to cut another material, yet for carving wood, his knives are the perfect instrument. Game code by and large is craft informed by engineering.
  • Pseudonym
    Thanks, Mike. Wrapping working code in a class may be ugly, but yeah, it works.

    Incidentally, it wasn't just the upper limit on capacity that concerned me with StringMap.

    The exception-safety aspect did concern me. The fact that Mike claims that it "can’t throw exceptions" just goes to show how subtle exceptions can be, if a good C++ programmer can't spot situations where it might happen. There's the inOrder stuff that manipulates STL containers (which is safe in this case), there's always the possibility that Data::operator=() might throw (which, again, is safe in this case, but it only takes a small refactor and instantly it's not).

    Dealing with error conditions is the biggie. Admittedly, it's the original code's fault that the return value of malloc() isn't checked, but still, it's a fault.

    Alexey wrote: "It is common in game programming to use limited containers for performance reasons."

    I don't think I'll ever understand game programmers. They do the most horrible hacks that anyone else in any other area of high-performance computing wouldn't even begin to consider.

    I suspect it's more often for deadline reasons than performance reasons, but don't quote me on that.
  • admin
    Alexey, I can't get the bufn-- to display properly. I put a space in, it's ugly but hopefully people will get it.

    Here's the dictionary.txt file I used:

    http://java.sun.com/docs/books/tutorial/collections/interfaces/examples/dictionary.txt


    Here's the timing code. I haven't filled in a gcc version of the cycle timer. I would appreciate your timings with STLPort; I might imagine it to be much faster than the Dinkumware implementation Microsoft ships. Please also mention which compiler and settings you used.

    Sorry for the screwball formatting, I need some kind of nice code display plugin...


    class Cycles
    {
    public:
    Cycles()
    {
    Reset();
    }

    void Reset()
    {
    Sample();
    start = cycles;
    }

    #ifdef _MSC_VER
    void Sample()
    {
    int dwLow, dwHigh;
    __asm
    {
    rdtsc
    mov dwLow, eax
    mov dwHigh, edx
    }
    cycles = ((unsigned long long)dwHigh << 32) |
    (unsigned long long) dwLow;
    }
    #else
    void Sample() { cycles = 0; }
    #endif

    unsigned long long Total() const { return cycles - start; }

    unsigned long long start;
    unsigned long long cycles;
    };


    std::vector<std::string> dictionary;
    std::ifstream dict;
    dict.open("d:\\dictionary.txt");
    while (!dict.eof())
    {
    char buff[256];
    dict.getline(buff, 256);
    dictionary.push_back(buff);
    }

    Base::Cycles ct;
    // test one, map, insert dictionary, alphabetical order
    {
    // 565M, 3248M in debug
    ct.Reset();
    std::map</std::string><std::string> mapTest;
    for (size_t i = 0; i < dictionary.size(); ++i)
    {
    mapTest.insert(std::pair</std::string><std::string>(dictionary[i], i));
    }
    ct.Sample();
    printf("map, alpha, %15llu\n", ct.Total()/1000000);
    }
    // test two, StringMap, insert dictionary, alphabetical order
    {
    // 62M, 131M in debug
    ct.Reset();
    StringMap<int> mapTest;
    for (size_t i = 0; i < dictionary.size(); ++i)
    {
    mapTest.insert(dictionary[i].c_str(), i);
    }
    ct.Sample();
    printf("Strmap, alpha, %15llu\n", ct.Total()/1000000);
    }
    std::random_shuffle(dictionary.begin(), dictionary.end());
    // following tests not measured in debug, std::map is so very much slower (more than 5x)
    // StringMap is approximately 2x slower
    // test one, map, insert dictionary, alphabetical order
    {
    // 551M
    ct.Reset();
    std::map</int></std::string><std::string> mapTest;
    for (size_t i = 0; i < dictionary.size(); ++i)
    {
    mapTest.insert(std::pair</std::string><std::string>(dictionary[i], i));
    }
    ct.Sample();
    printf("map, shuffle, %15llu\n", ct.Total()/1000000L);
    }
    // test two, StringMap, insert dictionary, alphabetical order
    {
    // 86M
    ct.Reset();
    StringMap<int> mapTest;
    for (size_t i = 0; i < dictionary.size(); ++i)
    {
    mapTest.insert(dictionary[i].c_str(), i);
    }
    ct.Sample();
    printf("Strmap, shuffle, %15llu\n", ct.Total()/1000000L);
    }
    // Random find of everything
    {
    StringMap</int><int> mapTest;
    for (size_t i = 0; i < dictionary.size(); ++i)
    {
    mapTest.insert(dictionary[i].c_str(), i);
    }
    std::map</int></std::string><std::string> mapTest2;
    for (size_t i = 0; i < dictionary.size(); ++i)
    {
    mapTest2.insert(std::pair</std::string><std::string>(dictionary[i].c_str(), i));
    }
    std::random_shuffle(dictionary.begin(), dictionary.end());
    {
    // 87M
    ct.Reset();
    for (size_t i = 0; i < dictionary.size(); ++i)
    {
    int data;
    mapTest.find(dictionary[i].c_str(), &data;);
    }
    ct.Sample();
    printf("Strmap, find, %15llu\n", ct.Total()/1000000L);

    // 273M
    ct.Reset();
    for (size_t i = 0; i < dictionary.size(); ++i)
    {
    int data;
    std::map</std::string><std::string>::iterator j = mapTest2.find(dictionary[i]);
    }
    ct.Sample();
    printf("map, find, %15llu\n", ct.Total()/1000000L);
    }
    }</std::string>
  • admin
    Pseudonym,

    My goal in wrapping Bentley & Sedgewick's code was to use it almost verbatim. It is well known code that works well and has a large body of documentation to go with it, and I didn't really want to modify it for stylistic reasons. The only reason to wrap it was that as implemented you could have only one global tree, which is kind of useless. But wrapped into a class it became immediately useful.

    In general, I would agree with you that simply slapping a class on C code is ugly, but I felt I was justified just this once :)

    I'll refer to this nice paper on the boost site on exception safety:

    http://www.boost.org/community/exception_safety.html

    Since the code I've posted can't throw exceptions, I'll assume you are referring to the notion of dealing reasonably with error conditions (which this code doesn't, input checking would be an obvious improvement at little runtime burden), that resources aren't leaked (it doesn't, so that's one positive), and that errors are reported to the user (they aren't).

    I'll certainly agree with you that this is not production quality code; it's meant to be illustrative of how a StringMap could take the place of a std::map with a string key.

    I'm afraid I have to agree with you that the freearr mechanism they proposed is not the clearest thing ever; I had to read it slowly for the ah-hah! light to go on. The sheer lightness of it did impress me though, so I figured why mess with it? It works.

    As Alexey says it is common to use limited custom containers for performance reasons. So while I would definitely adopt remedies to the points you mention in production code, I would also not go and add all the extra bits (like a remove method) that would make the code larger and less focussed on doing one thing very well.
  • Alexey
    Would you please provide us with your test code? My quick and dirty test does not show such dramatical difference between StringMap
    and STLport's std::map (at least not for find())...
  • Alexey
    Pseudonym,
    It is common in game programming to use limited containers for performance reasons. It is quite scary for a game programmer to use *dynamic* container of pools to fight memory allocations :)
    But in this case I bet Nick's goal was to make it as simple as possible, so I think it's ok.

    PS. I seems the publishing system likes to eat some symbols, I'm pretty sure "if (bufn– == 0)" should really be "if (bufn–- == 0)"!
  • Pseudonym
    I have some real problems with C-style programming in C++, not the least of which is that this code isn't exception-safe.

    Most other high-performance C++ application areas wouldn't stand for that. Is this common in games programming?

    Anyway, some suggestions on how to make the C-style pool allocation work in C++.

    First, I'd replace each pool with a lightweight object that owns the pool. Perhaps something like this:

    struct Pool
    {
    Tnode* buf;
    Pool() : buf(new Tnode[buffPoolGrowSize]) {}
    ~Pool() { delete[] buf; }
    };

    Another option is boost::scoped_array.

    2. Replace freearr with a dynamic container of pools. Probably std::deque is the best option here, because it doesn't do any copying.
blog comments powered by Disqus

Bad Behavior has blocked 877 access attempts in the last 7 days.