09.03.08

What is Life?

Posted in Science at 12:27 am by admin

Life is a spontaneously auto-catalytic process or system that decreases local entropy.

- Nick Porcino, August 2008

Corollary: A machine may be spontaneously auto-catalytic, but is only a machine if it increases local entropy.

Introduction

I had several days with hours at a stretch to think, and the definition for life above was what I got out of it. I like this definition, it’s clean, concise, and doesn’t have any of the usual sort of wobbly pitfalls of the usual sort of definition. It also doesn’t reach for any arm-wavey consumption of intangibles such as negentropy, and doesn’t need a table of exceptions or justifications about why one thing can be considered life and another can’t. It might also help us make a call when aliens show up, or when a computer gets a bit too smart.

 

Overturning Convention

There is a nice article at Wikipedia describing the conventional definition of life, and lists some counter-examples that break convention. Life is often defined as an ad hoc melange of rules:

  • homeostasis (regulation of internal state, something robots can do)
  • organization (limits life to cellular organisms - virii are not life by convention)
  • metabolism (consumption of energy to create cellular components)
  • growth (maintaining a higher rate of synthesis than catabolism)
  • adaptation (some simple organisms have remained largely the same for millions of years - are they therefore not alive?)
  • response to stimuli
  • reproduction (are worker bees not alive?)

One can fiddle away on these rules trying to refine them and catch border cases, but it never really pans out.

The systems theory definition of life states that live things self-organize and self-reproduce. This is closer to what I am after, but not strong enough to make predictions, or to address faults and ambiguities within the ad hoc conventional definition.

Other definitions are organized around agency, replication, or enzymes. All of these are narrowly focused around particular interpretations specific to particular fields of study.

Erwin Schrödinger wrestled with the question and came close to what I think is the right definition when he suggested that living systems import and store negative entropy or negentropy. Where this fails for me is that negentropy is an information theory statistic used as a measure of local normality; in my definition I am using entropy as a measure of the ability of a given chunk of space-time to do work. So, just as quantum mechanics describes physical processes without revealing the reality of the physicality, the storage of negative entropy describes what life is getting up to, but it does not tell what is going on. To put it formally, statistics and chance are purely epistemological concepts without ontological import.

 

Reading my new Definition

Spontaneously auto-catalytic system or process

Life happens on its own; it bootstraps itself; when the time is right, it picks up and goes. A seed in the presence of moisture begins to manufacture a plant. A quickened ova divides as factories in the cell’s nucleus begin to run. This is not sufficient for a definition of life. it encompasses things that we unambiguously do not consider alive such as fire, a wound-up wall clock, or reaction-diffusion systems.

Decreasing local entropy

There are a number of ways to look at this; the easiest is to imagine winding a clock-spring tighter. When that spring is wound, the potential is there to drive the clock. One can imagine that life winds the spring constantly through some kind of process of conversion of environmental energy to potential in the spring. To pick a life-form excluded by convention, virii wind their spring by organizing matter into self-replicating space-time geometries. Fire is excluded from being considered alive, because it unwinds the springs; entropy is increased by fire. By itself, the clock is not alive, it can only wind the spring down and is therefore a machine.

I suggest the words “decreasing local entropy” to help with conceptualization of the definition; one can imagine points of life distorting an entropic landscape similarly to the visualizations of black holes warping space-time.

 

Surprising Inductees to the Hall of Life

My definition welcomes back the worker bee, and the virus. My definition also allows bacteria that can only survive in host cells. It will make it easier to classify alien life should it present itself some place like Mars in some difficult to conventionally pigeonhole fashion.

Some things that we describe in terms of living and dying are reminiscent of life, but are not life - companies, cities, and nations do not auto-catalyze although it sometimes appears that they do. I’m including this note because it is interesting to debate.

It would be possible for robots or computers, or even the Internet to be alive, although present day versions are not spontaneously auto-catalytic, and do not inherently decrease local entropy since they must be manufactured, consume energy, and do not effectively wind springs in return.

08.23.08

Progress in Contouring

Posted in Rendering, Volume Rendering at 10:47 pm by admin

Vertices lead to Edges, Edges lead to Faces… Faces lead to Rendering

- Voxel Master Yoda

Introduction

Contouring is the process of converting implicit surfaces, such as volume data or voxels to a polygonal representation. This sort of operation is very useful for rendering blobs, finding the surface of water in a fluid simulation, performing mesh simplification, extracting surfaces from CAT scans, or even creating navigation graphs for AI characters to path plan against as described in Ju, Schaefer and Warren [2], and Schaefer and Warren [5]. Furthermore, if a spatial hierarchy like an octree is used to construct the contours and the hierarchy is retained with links to the tessellate faces, collision detection against the mesh can be done with extreme efficiency. This article traces current progress through a series of papers at Joe Warren’s site, chosen because I’ve enjoyed his papers a great deal over the years.

 

Figure 1, a navigation mesh formed from the negative volume of a game level [2].

 

Marching Cubes

Dual contouring traces its lineage through the marching cubes algorithm presented first at Siggraph 87 by Lorensen and Cline [1]. In the simplest version of marching cubes, a volume is grid sampled, and all corners of cubes in the grid are classified as inside, outside, or intersecting the surface. Faces on the cube can then be classified according to the combinations of intersections identified where an edge between corners changes from inside to outside. The classification becomes an index into a table of triangles that are then emitted to generate the final surface. There’s one extra twist which is the case of a diagonal stripe across the face; the strip could be either filled or empty - in this case sampling at the center of the face can be used to make the choice, see Figure 2.

Figure 2, classification in marching cubes. The red line is the edge to be checked, and the green indicates the two classification possibilities (from [2]).

 

Convex Contouring

Convex contouring is described in [2] as a method of improving the performance of marching cubes. One of the specific problems addressed were that marching cubes can generate inconsistent topologies if the choice indicated in Figure 1 is not evaluated correctly. Another issue addressed was that the tables marching cubes uses are hand created, not algorithmically derived from the topology and geometry of the individual polyhedrons. Where marching cubes generates triangles based on face intersections, convex contouring generates polygonal shapes that enclose the convex negative space within each cell.

Although the boundaries of the generated polyhedra are unique for each sign configuration of the sample cube corners, the polygonization is not. Imagine four points, two triangles could be generated with the shared edge spanning either of two pairs of vertices. A decision tree is introduced with possible triangulations at the leaves of the tree. The correct triangulation to pick is the one that results in all triangles facing the negative space within the cell convexly. The table construction process is described in [2], as is the algorithm for deriving a triangulation.

 

Dual Contouring

Marching Cubes results in a tessellation where vertices lie on the edges of the sampling grid, as does the convex contouring method. Dual contouring was introduced at Siggraph 2002 [4]. The data was represented as a signed octree for a significant reduction in memory requirements. The data was augmented with hermite data for exact intersection points and normals allowing reproduction of sharp edges and corners. Quadratic error functions were stored at each cube in the octree to allow the resultant mesh to be simplified versus an error metric.

Dual contouring methods produce vertices on the surface interior to each cube in the grid, and polygons dual to the edges of the grid. Since vertices are placed in the interior of the cubes and not on the edges of the cubes, there is more freedom as to where the vertex can be placed. This results in a tessellation that with more uniformly sized polygons, doesn’t generate the thin slivers introduced by marching cubes when sampling cubes are near the surface, and eliminates the gridding effect of marching cubes. Figure 3 shows a comparison between the results of marching cubes and convex contouring. Various extensions are described in [3] for quickly updating a tessellation and rapidly updates for CSG operations.

Figure 3, marching cubes on the left, dual contouring on the right (from [3]).

 

Dual Marching Cubes

Finally we come to Dual Marching Cubes by Schaefer and Warren [5]. This method aligns vertices in the tessellation with features of the implicit function. The tessellation itself occurs on a grid that is the dual of the structured sampling grid. The result is that thin features missed, or requiring a lot of subdivision are preserved with a much sparser polygonization than that resulting from a structured grid. The method proceeds by creating an octree describing the volume to be contoured, then a dual grid of the octree is created by linking vertices at the centers of each grid cell to its topological neighbours as shown in Figure 4.

Figure 4. The dual grid constructed on a quadtree (from [5]).

The surface is then extracted using a simple extension of marching cubes to dual grids.  Since each cell in the grid is topologically equivalent to a cube, the standard marching cubes tables can be used to generate the surface interior to the cell. Since the underlying representation of the data is an octree, the resulting tessellation is much sparser than Dual Contouring or Marching Cubes. Intriguingly, the results are similar to adaptive Marching Cubes since resolution is added appropriately by the creation of the octree. Figure 5 shows a comparison of Dual Marching Cubes with Marching Cubes and Dual Contouring. Further explanation of the algorithm can be found in this presentation.

Figure5. The rocket in the upper left is the original CSG model; upper right is the mesh extracted by Dual Marching Cubes, Marching Cubes is in the lower left, and Dual Contouring is in the lower right (from [5]).

 

In Closing

It appears that recent research is more concerned with generation of tetrahedral meshes, so as far as I can tell, this sampling represents the state of the art. An excellent collection of references can be found here.

 

References

[1] W. E. Lorensen and H. E. Cline. Marching cubes: A high resolution 3d surface construction algorithm. In Computer Graphics (Proceedings of SIGGRAPH 87), volume 21, pages 163–169, Anaheim, California, July 1987. pdf

[2] T. Ju, S. Schaefer, and J. Warren. Convex contouring of volumetric data. The Visual Computer, 19(7–8):513–525, 2003. pdf

[3] S. Schaefer and J. Warren. Dual contouring: The secret sauce. Technical Report 02-408, Department of Computer Science, Rice University, 2002. pdf

[4] T. Ju, SF. Losasso, S. Schaefer, and J. Warren. Dual contouring of hermite data. In Proceedings of the 29th annual conference on Computer graphics and interactive techniques, ACM Press, 339–346. pdf

[5] S. Schaefer and J. Warren, Dual Marching Cubes: Primal Contouring of Dual Grids. pdf

Equations in Wordpress

Posted in Code at 3:48 pm by admin

Found a nice LaTeX rendering service for embedding equations in a blog. It’s located at yourequations.com. The first thing I looked at was installing LaTeX on my server, and running it locally. I decided that the hassle was a bit much versus the benefit. I tried yourequations because I like the idea that the rendering is done elsewhere on a server maintained by someone who’s an expert on that sort of thing.

Test sample:

\int_{0}^{1}\frac{x^{4}\left(1-x\right)^{4}}{1+x^{2}}dx
=\frac{22}{7}-\pi

Unfortunately, it looks like RSS readers like Google’s don’t pick up on the tag, so maybe inline jpgs are the best answer after all. There are a few servers online that will generate a bitmap, such as this one, giving me the thing I was after, LaTeX without the install.

08.10.08

Will Super Smart Robots take over the World?

Posted in Rambling, Robotics, Science at 12:02 am by admin

 

 

Will super smart robots take over the world? I know this is probably something keeping you at night. I did some back of napkin calculations to gain some insight to this burning question.

Power Consumption vs. Clock Speed vs. Cores

In order to displace us, super smart robots are going to need to optimize their efficiency. They’ll need to store energy, move around, and think with an efficiency at least as great as ours. We are well optimized for our environment with quite a practical balance between computational power sufficient for survival, energy storage, and an ability to modify the environment to our needs.

Starting with computational efficiency, clock speed versus power consumption varies from nearly linear to cubic in the more advanced designs. To put it another way, power consumption goes up or down with the cube of the processor speed  - in a modern CPU, running at half the clock can run at as little as one eighth the power.

Power consumption per CPU core is of course linear; if you have two processors running the same program, they will consume twice the power of one of those processors running the program. So, if you can structure your code to utilize available cores effectively, you should be able to achieve greater performance for less power. For example, a dual core Pentium might be able to perform the same processing at half clock and one quarter power as a single Pentium running at full clock.

If you follow this logic to the extreme conclusion, the ideal solution is clearly to have as many processors as physically possible  – even billions - running as slow as possible yet generating results quickly enough to still be useful. This will maximize your computation per watt, which is clearly what you want to do if you want your robot to have a reasonable operational capacity. 

How Fast is Fast Enough?

From the point of view of the graphics community, quickly enough might be defined as 24 Hz for film, or 30 or 60 Hz for games; these numbers correspond to the human flicker fusion response rate. The significance of the flicker fusion rate from a survival perspective is clear, given the round trip reaction time of seeing something, and initiating a motor response. A startle eye-blink occurs at around 100 to 200mS after a stimulus, with the minimum measurable reaction time being 10mS. The flicker fusion rate just faster than the speed at which we process visual data at a survival-sufficient rate. Completing computations at around 100Hz seems like a sufficient rate for human like performance; so we can argue that performing at that speed would be good for our super robot too; faster will consume more power; slower, and the robot won’t be able to keep up with us and won’t be super.

http://hypertextbook.com/facts/2006/reactiontime.shtml

What is Good Energy Density?

The next question is what is a reasonable energy density? A super robot should be able to carry around as much energy per unit weight as we do. If they have a lower energy density, they won’t be able to run as long as we do without refueling. If their energy density is better, they’d be in good shape. The best batteries hold between 0.5 and 1 MJ/kg.

http://en.wikipedia.org/wiki/Lithium_ion_battery

Fat, which is where we store our energy, holds roughly 38 MJ/kg, somewhere between 50 and 100 times more.

http://hypertextbook.com/facts/2004/PingZhang.shtml

http://en.wikipedia.org/wiki/Energy_density

Gasoline and similar fuels have a similar energy density as fat, but have got obvious issues of production and supply that limit availability over the long term. So let’s go with fat as a current, plausible, easily manufactured by the individual robot, ideal energy source for conversion of energy to computation. Fat is manufactured through a very simple process, and makes efficient use of the most common elements found in the galaxy, Hydrogen, Oxygen, and Carbon.

http://earthguide.ucsd.edu/virtualmuseum/ita/03_2.shtml

How Many Cores?

For reference, note that the neuron is basic computational node in the brain, and that the brain has on the order of 100 billion neurons. The typical power consumption of the human brain is roughly 20W, about 20% of the total energy consumption of the human body. Operations per second is not a reasonable thing to work out for the human brain since a neuronal computation is not comparable to what a CPU does, but I will point out that the computation is super-parallel, and that as we cram more slower CPU cores into the super robot brain (more is better, until no more can be packed in) , the nature of the computation performed will change, and will probably come to approximate what the human brain does.

http://hypertextbook.com/facts/2001/JacquelineLing.shtml

Nature, having run a long term optimization project on us, has probably settled on the configuration of the human as the best-to-date tradeoff between computational power, energy consumption, waste heat, and power delivery and storage, and we can probably use that as a benchmark for an ideal robot.

The Cylons are Us

Super robots would face all the survival issues that we do - energy creation, storage, and consumption. They would face the same issues of infrastructure we do in terms of occupying space for our physical existence and for mechanisms of energy production such as farms. They would probably have to be just like us, since we’ve already evolved to optimally fill our niche on the Earth. If they were smarter than us, they would need to move out of the optimal state; they’d have to trade off computation, mobility, or numbers, all of which would leave us our niche.

The only variation that get a just-like-us robot an edge would be to over-clock for short periods of time at the expense of markedly increased energy consumption. So in conclusion, I think we don’t really have to worry about it, at least until batteries can store more energy than fat.

Useful Conclusions

  1. More parallelism and less clock is excellent for optimizing how we use generated power. This doesn’t immediately lead to burning less coal, but it most likely does. It also means that computing can become even more pervasive in an efficient way.
  2. Super smart robots that can exceed human capabilities can exist, but not in sufficient numbers to displace us. They can either approach human capabilities and power consumption and thus match our numbers, or they can exceed our capabilities and decrease their numbers by a cubic proportion, and thus not pose a threat to our dominance.

 

Author’s note: This article is really about why multi-core processing is an excellent and natural thing, and what the limits of multi-core processing are. It’s presented in a humorous-to-me context.

08.07.08

Cement from CO2

Posted in Science at 4:11 pm by admin

This isn’t turning into a global warming blog, but this technology is so cool I have to pass it along. A company called Calera is proposing to bubble the output of the Moss Landing natural gas fired power plant through water in order to reclaim the CO2 as cement. It works by turning the CO2 into carbonate, and combining it with the calcium and magnesium in sea water. Read more at Scientific American. I first came across this concept in Marshall Savage’s The Millenial Project, Colonizing the Galaxy in Eight Easy Steps where he proposed the construction of cities floating at sea, constructed from seacrete. It’s cool to see it being proposed as a method of carbon sequestration; in total natural gas powerplants output a serious amount of CO2 every day.

08.03.08

My first instructable

Posted in Science at 6:56 pm by admin

This post at stuntrabbit convinced me that I had better get off my butt and invent, and then share my results at instructables.

I decided making a carnivorous plant tray out of an old pallet would be worthwhile, so here’s the result!

07.27.08

Coal, Glitchy Update Loops, and Waiting

Posted in Code, Games, Rendering at 5:52 pm by admin

Image licensed under CC Share Alike, found here

I’ve been fighting a glitchy update loop on Windows that I don’t see on the equivalent update loop running on Linux. After a fair bit of digging, I narrowed it down to a call to Sleep in the Windows main loop. Eliminating the Sleep made the frame update buttery smooth like it is on Linux, but it shot my CPU usage to 100%, and the computer’s fans roar like a vacuum cleaner.

Some research into this issue got me a good answer: a smooth framerate, a cool CPU, and quiet fans. Before the solution is revealed though, I wanted to bring some game programmer dark wisdom into the light where it may shrivel to a crisp and blow away.

Game Programmer Dark Side Wisdom

The most common advice people seem to give on the web is that it’s fine to run the CPU at 100%, let the fans blow, what other use is your computer any way, especially when you have a game going? A cowed minority mentions that they’re on a laptop, and grudgingly it’s admitted that maybe an option for laptop users to run at less than 100% would be an admissible but sad concession.

CPU at 100% is not fine

Forget that! Anyone who thinks running at max power is a good idea should re-watch The Day After Tomorrow, and the compare and contrast the weather report on your evening news. News flash, it’s very likely that the energy you consume to run your game at 100% came from burning something.

Game Programmer Light Side Wisdom

Luckily, the right answer does exist, and thanks to Christian Weis for working it out. The code debugged is posted here for posterity. This version is way stripped down from the original post, but accomplishes what I want; CPU loading is very light, the fans aren’t going, and I’ve got no stuttering. I’ve calculated the wait delay I want in my main loop based on the time remaining to achieve my desired framerate (in my case, 60Hz).

// due to: http://www.gamedev.net/community/forums/topic.asp?topic_id=445787&whichpage=2&#2960603
// allows sleep at fine granularity with no stuttering, and not pegging CPU to 100%
void MySleep( float Seconds )
{
    static HANDLE Timer = CreateWaitableTimer( NULL, FALSE, NULL );

    // Determine time to wait.
    LARGE_INTEGER WaitTime;
    WaitTime.QuadPart = (LONGLONG)(Seconds * -10000000);
    if ( WaitTime.QuadPart >= 0 )
        return;

    // Give up the rest of the frame.
    if ( !SetWaitableTimer( Timer, &WaitTime, 0, NULL, NULL, FALSE ) )
        return;

    DWORD Result = MsgWaitForMultipleObjects
    (
        1,
        &Timer,
        FALSE,
        INFINITE,
        QS_ALLINPUT
    );
}

Consequences!

Please consider adding this or something like it to your main loop. Minimally your fans are going to run less, and your workspace will be a little quieter.

I ask you all to consider carefully the consequences of the choices you make every day, even while you code.

  • That shader is really gorgeous, and I’m personally first in line for the latest and greatest, but could you do it in a lighter way?
  • Do you really need a full rebuild?
  • Would a better search algorithm burn less coal?
  • It’s cool that some ancient clunker still runs and can serve files, but if you turned it off could you buy a newer-faster-bigger server with the power you saved in one year?
  • Did you turn off your monitor before you went home?

Seriously, we’re all in this together.

07.26.08

Previsualization for Clone Wars

Posted in Convergence, Film, Games at 1:13 am by admin

Cool article on the previsualization of Clone Wars over on the Official Star Wars Blog.

What we’re working on is an extension of what George has been working on since the ’70s,” says Director Dave Filoni. “It’s really how do you cinematically tell the story. For George, telling a story starts and ends in editorial. … George likes to edit and cut on motion and action, across head movements and hand movements. The editing of A New Hope is very fast paced. That kept the momentum moving in the story. He wanted to develop a way to get that final editing, which he could only do with live action coverage, really early on in the process.”

Filoni explained how ILM’s Dennis Muren cracked this challenge in using videomatic techniques for the speeder bike chase in Return of the Jedi. Muren used tabletop models and action figures to cut together a rough version of the chase to give Lucas footage he could use in editorial. So before computers, they actually used these little models, and they got approximate camera angles.  So you see the progress from static images to moving images.

We’ve used similar techniques in the past to mock up game levels and game play, so those techniques are useful when computers are around. Half the fun was to be discovered by unsuspecting hikers as they come around a bend in the marsh and find you and your team in the mud with action figures making pwew-pwew sounds!

07.19.08

Siggraph

Posted in Conferences, Siggraph at 4:28 pm by admin

In preparation for Siggraph 2008, this post gathers my notes from previous Siggraph conferences.

Summaries:

Individual Articles:

07.13.08

Science, Invention, Engineering

Posted in Rambling, Science at 4:56 pm by admin

I was thinking a little while ago about how the creative process of invention is related to science and engineering. It occurred to me that one of the most useful aspects of science for the inventor is that you can use good science to make predictions about how things will behave. Engineering is how the inventor can turn an idea into a practical object. The invention process feeds back to science by providing observations and imaginative frameworks. Invention feeds back to engineering by providing synthesis of techniques and applications.

07.12.08

What is it about game coding?

Posted in Code, Rambling at 5:22 pm by admin

Beauty is in the eye of the beholder, and what seem to some practitioners to be the most hideous code possible, to someone else can be an elegant rose. A wood carver would never allow his knives to cut another material, yet for carving wood, his knives are the perfect instrument. Game code by and large is craft informed by engineering. If we were to talk about drivers, or resuable engine components, that is a different matter, you had better be engineering.

The obvious

Game code runs in a closed sandbox. Good game code has all known paths executed, and has been optimized for cycle efficiency. You don’t have to worry about some things that other programmers have to worry about because the execution of certain functions is strictly deterministic, and not subject to console to console or user to user variation.

Poor game code does not exercise all possible paths, and is not thoroughly tested for reasonable outcomes (such as out of memory or disk not in drive). That’s the difference between a game that can sit flawlessly in a kiosk in retail for weeks without restarting, and a game that won’t even allow for a complete play through in one sitting.

The compromise

Game code is optimized within a cycle of its life. APIs and code need to only survive the known condition of the particular game with the particular data on the particular console with a particular kernel. The fastest code is the code that doesn’t execute, so if we can observe that certain lines of code only support safety against a non-thorough programmer, we tend to eliminate it and work hard to make sure our code does not execute illegally. On the other hand, code that will be exercised under conditions that cannot be strictly enforced, or written by people who don’t have the necessary understanding to deal with such a raw exposure, should be thoroughly return tested, and have error recovery.

Lex parsimoniae

Unlike many other high performance applications, the performance of game code is driven by artistic and aesthetic loading. The faster you make the game code, the more the artist will add data to bring the performance back down. Naturally your goal is the best possible look and experience, so you need to maintain the safety of the code, while minimizing the cycles executed. This means we don’t optimize or write code for reusability, we write code and optimize it for the product we are going to ship. If our primary goal was reliability, or accuracy, then there are a completely different set of tradeoffs we would engage. Our guiding rule is Occam’s Razor - “entia non sunt multiplicanda praeter necessitatem”, no more than is necessary.

You get one turn at bat

In general, there is no version 2. We’ve got to have it right the first time the first user touches the game. It’s not a primary goal that the code can be turned into another product (although we do try to refactor, triage, and fix to make the next game better technically).

Unlike other high performance realtime applications, such as aircraft control, where it does have to work perfectly the first time, we don’t have control over our target hardware. If I was building a controller for a space shuttle, I would ensure that the hardware was adequate to support the functional goals and the characteristics of the program that I need to run. On a game console, there is one unchangeable box out there, and you need (and want) to squeeze every drop out of it.

Competitive advantage

As a game programmer every cycle you can squeeze out of the machine, and still support the end user experience, is potentially a competitive advantage if it betters the interactive and aesthetic experience.

And that’s where all the weird code comes from!

07.07.08

Smallest Possible Attribute System, Part 3

Posted in Code at 12:48 am by admin

Okay, barring suggestions from you, I’m done with SPAS. It does what I wanted it to do, and there’s enough substance there for SPAS to be useful as a tutorial. The final result is here: AttributeBinder.h, AttributeBinder.cpp, and you’ll need the StringMap.

 

Our Story To Date

In the first edition, I introduced a tiny attribute system based around a Binder per class, and a Descriptor per attribute within each bound class. By reader suggestion, I extended it to allow for inheritance, although I didn’t wind up supporting multiple inheritance; you can read some of my research as to why multiple inheritance is not great for performant code here.

An update, changed the typeName in the Descriptor to be a const char* instead of a std::string. This was born from the realization that the binders are all statically initialized any way, so why duplicate the string data? Per reader comments, it turned out to be possible to eliminate more string copies.

Then, I introduced a type mechanism to replace the use of C++ RTTI.

Next I introduced Bentley & Sedgewick’s ternary search tree to make type look up fast, and to allow typeId to be cached as an int. This code introduced some thread-unsafety, which could be mitigated by putting a critical section at the spot indicated in the StringMap code, or in the type registration code. My logic for not doing so at the moment is that the classes are all bound at start, before main begins, and as the user won’t have been able to launch any threads yet, there’s no possibility of a stomp. If you were to add types later in a threaded runtime, a thread safety method would be needed. An alternative to the proposed critical section would be to implement a lock-free CAS style mechanism to the search tree; generally these are difficult to write, but given some similarities in the tree to a list, it might not be difficult to implement.

 

Meanwhile, back at the ranch

I had a look at reasonable applications for SPAS, and came up with:

  • Script variable binding to C++ data
  • Disk serialization
  • Network transmission
  • Editor binding

To support all of that, I thought it would be good to show binary reflection as part of the tutorial. Suddenly, the Smallest Possible Attribute System became not the smallest possible.

 

Feature Creep

I had certain design features I wanted to hit, like

  • having the reflected data generally immune to changes to the reflected classes. I also wanted to keep the reflected data as small as possible. That meant I needed a type and variable table in the reflected data.
  • reflecting contained objects as well as pointed-to objects. This introduced the need for class factories.
  • no overhead or wrappers on bound variables, reinterpret_cast is the means to get at a variable
  • does not impose vtables
  • minimal programmer interface
  • no templated attribute types - I want to keep the code as light as possible without needing to specialize the code per type
  • support pod (plain old data)
  • does not use built in C++ RTTI mechanisms - disassembly of typeinfo operations revealed a shocking world of string compares
  • text serialization would be an exercise to the reader. I personally don’t need it, although the tutorial code includes a trivial version using ostream and stringstream.

Current implementation limitations, that exist for no particularly good reason except that I don’t currently need them, or that they would obscure the tutorial nature of the code:

  • No platform swizzles, so data can’t be serialized on one machine and deserialized on a machine with different endianness.
  • No cross-compatibility between 32 and 64 bit systems.
  • No automatic padding to ensure aligned reads and writes.
  • No support for arrays of data
  • No support for multiple inherited objects
  • No support for STL containers

I refer you to the implementation and surrounding documentation to discover the serialization algorithm.

 

How about an example?

A minimal set of macros is used to reflect an object. Here’s an example of three classes, demonstrating containment, pointers, and inheritance.

class Bar
{
public:
    BIND_START;
    BIND(Bar, yay, float);
    BIND_END;

    Bar() : yay(33)
    {
    }

    float yay;
};

BIND_ATTRIBUTES(Bar);

class Foo
{
public:
    struct FailType { };

    BIND_START;
    BIND(Foo, myInt,    int);
    BIND(Foo, myFloat,    float);
    BIND(Foo, stuff,    char*);
    BIND(Foo, stuff2,   std::string);
    BIND(Foo, yow,        bool);
    BIND(Foo, fail,     FailType);
    BIND(Foo, bar,        Bar);
    BIND_END;

    Foo() : myInt(1), myFloat(2), yow(false)
    {
        strcpy(stuff, “Some text”);
        stuff2 = “More text”;
    }

    int myInt;
    float myFloat;
    char stuff[32];
    bool yow;
    Bar bar;
    std::string stuff2;
    FailType fail;
};

BIND_ATTRIBUTES(Foo);

class Baz : public Foo
{
public:

    BIND_START;
    BIND_BASE(Foo);
    BIND(Baz, bool1, bool);
    BIND(Baz, int2, int);
    BIND(Baz, float3, float);
    BIND(Baz, nullBarTest, Bar*);
    BIND(Baz, barTest, Bar*);
    BIND_END;

    Baz() : bool1(true), int2(2), float3(3), nullBarTest(0)
    {
        barTest = new Bar();
        barTest->yay = 12.0f;
    }

    bool bool1;
    int int2;
    float float3;
    Bar* nullBarTest;
    Bar* barTest;
};

BIND_ATTRIBUTES(Baz);

These macros introduce a static member of type AttributeBinder to every class. AttributeBinder provides a few interesting methods - WriteBinary, ReadBinary, and a StringMap of attribute names to AttributeDescriptors. AttributeDescriptors know how to find the member in the class, how big a member is, and a little descriptive information such as type. In general, one would look up a variable on the AttributeBinder, and then turn the descriptor into a pointer.

    Baz myFoo;
    AttributeBinder::AttributeDescriptor ad;
    Foo::binding.attribs.find(“myInt”, &ad);
    int* intPtr = (int*) ad.DataAddr(&myFoo);

 

An Exercise for the Reader

I’ve left error handling as a dreaded exercise to the reader, especially since everyone’s error handling mechanisms are so different, depending on the application. One could specify an error handling policy or trait on the AttributeBinder, but whatever.

The code is heavily documented (it’s not self-documenting, I documented it!), and available here:

AttributeBinder.h, AttributeBinder.cpp

Enjoy! With a bit of thought some of the limitations, cruftiness, and bloat could be eliminated to make this code actually the Smallest Possible Attribute System.

 


Research

General searching revealed a number of homebrew type systems, and a variety of sophisticated implementations such as that in boost python. To show why SPAS has a purpose in the face of so many alternatives, I turn to Game Programming Gems volume 2 where examples and tutorial explanations of all the common mechanisms can be found. I recommend this volume for those of you wanting to do your own research into this topic.

Scott Wakelin, Dynamic Type Information, Game Gems 2, pp. 38-45

Similarities:

  • Type info is a class statically embedded in each reflectable class.
  • Has a pointer to parent class to encode a class hierarchy.
  • No support for multiple inheritance, although it should be possible in Scott’s system due to reliance on C++ RTTI.
  • Type equivalence can be determined for the class, or downcasts.

Differences:

  • Scott’s system uses C++ RTTI in order to check type equivalence of upcasts.
  • Scott’s system uses dynamic_cast, which requires the existence of a vtable.
  • Scott’s system uses the << operator which I avoid because I am not at all pleased with the disassembly I see, nor am I pleased with the appearance of << execution during profiling
  • Scott’s system doesn’t handle pointer types

Charles Capelli, A Property Class for Generic C++ Member Access, Game Gems 2, pp. 46-50

Charles’ system is similar in intent to C# properties. He introduces a property class to wrap each variable, and a property set to iterate over them. SPAS also has a property set which can be iterated. Charles’ system is a traditional templated specialization system.

Kasse Staff Jensen, A Generic Tweaker, Game Gems 2, pp. 118-126

Kasse’s system shares with SPAS the design goal of not imposing overhead on reflected variables, and the goal of having an absolutely minimal API burden. Kasse’s system uses a templated class type to identify things, and imposes a vtable to get C++ RTTI.

Similarly to SPAS, Kasse’s system uses a map to go from the name of a variable to the variable, and it uses reinterpret_cast to get at the real variable.

07.06.08

StringMap

Posted in Code at 8:37 pm by admin

The smallest possible attribute system needed a fastest possible lookup. Functionally, a std::map<std::string, int> or std::map<const char*, int, Pred> might have done the job. The performance wasn’t there however. I did a bit of research and discovered ternary search trees by Bentley and Sedgewick; wrapped them up with a template to turn what was effectively a set into a map, timed it, and provide the result here. Note that real production code would test for error conditions, and deal with them in a fashion appropriate to your application.

This code is intended to get you thinking about what kind of container might be appropriate to the job you have at hand, and also to encourage you to not limit yourself to what the library vendor provided you or what dogma might dictate.

Timing

For a timing test, I read in a dictionary, added it to a std::map and a StringMap in alphabetical order, shuffled the dictionary and added all the words out of order, then shuffled the dictionary again, and searched for all the words in the map and the StringMap. In general, the results below can be multiplied by two to know how fast StringMap runs in debug, and multiplied by six to ten to know how fast std::map runs in debug. Timings are reported in CPU cycles, and measured under MSVC 2008 with the out of the box STL, optimized build, multi-threaded release libraries, RTTI and exceptions turned on.

Update. I have averaged many more samples. I have retested with all combinations of RTTI and exceptions off and on. Contrary to conventional wisdom, I was not able to measure a statistically significant difference between the performance of all four variations.

Test std::map StringMap
insert, in order 540M 55M
insert, shuffled 540M 85M
find all 260M 85M

Observations

A further benefit of StringMap is that very few heap allocations are involved. The insert time of std::map could be improved with a custom predicator, and a custom allocator, however the find time can’t be improved greatly. One of the big differences between a map of strings and a StringMap is that during find, the std::map needs to do a strcmp for every node of the tree it visits, whereas the StringMap advances a character at a time through the search string as it advances through the nodes. This alone makes a huge difference.

The last merit I want to point out is the sheer tiny elegance of Bentley and Sedgewick’s code - insert is a mind boggling 50 lines, and find is a mere 22 lines long. (Ignore my clumsy InOrder function, it’s not the elegant part!)

If you want to use this code as a set, remove the Data from the Tnode, and all references to it - this will bring the code back in line with the original.

For more information and formal performance analysis on ternary search trees, I refer you to Bentley & Sedgewick’s informative article at Dr. Dobb’s, and their scholarly research.

Exception Safety

The code as presented here is exception safe. I have replaced the malloc/free in the original posting with new[]/delete[], and added a throw of bad::alloc in the case of filling up all the pools. It is permissible for assignment of data to throw an exception, it is up to the user of the StringMap to catch and respond accordingly - if data cannot be assigned, it is not up to StringMap to do something about it.

 

#ifndef STRINGMAP_H
#define STRINGMAP_H

#include <stdlib.h> // for malloc/free
#include <vector>    // for the InOrder function.

/*
This code is described in “Ternary Search Trees” by Jon
Bentley and Robert Sedgewick in the April, 1998, Dr. Dobb’s Journal.

http://www.ddj.com/windows/184410528

Other algorithms and analysis are available here:

http://www.cs.princeton.edu/~rs/strings/

Ternary search trees offer substantial advantages over both
binary search trees and digital search tries. We feel that they are superior
to hashing in many applications for the following reasons:

Efficient and easy to implement
No extra overhead for insertion or successful searches
Usually substantially faster than hashing for unsuccessful searches
Gracefully grow and shrink; hash tables need to be rebuilt after large size changes
Support advanced searches, such as partial-match and near-neighbor search.
Traversal to report items in sorted order. 

————————————————-

I’ve added the StringMap template wrapper to encapsulate the glboal variables
in Bentley & Sedgewick’s original code. Otherwise, this implementation is
substantially identical.

StringMap needs a simple and efficient iterator in order to get rid
of the InOrder method

I haven’t bothered with adding an erase function, or STL compatibility,
because my goal is an as-light-as-possible symbol table look up. The extra
complexity doesn’t help that goal.

*/

#ifdef HAVE_EXCEPTIONS
#  define STRINGMAP_ERROR throw std::bad_alloc();
#else
#  include <assert.h>
#  define STRINGMAP_ERROR assert(false);
#endif

template <class Data>
class StringMap
{
public:
    static const int buffPoolGrowSize = 1000;
    static const int maxPools = 1000;

    struct Tnode {
        char    splitchar;
        Data    data;
        Tnode*    lokid;
        Tnode*    eqkid;    // contains middle kid if splitchar != 0, else char* pointer to full string (node is a leaf)
        Tnode*    hikid;
    };

    StringMap() : buf(0), bufn(0), freen(0), size(0), root(0)
    {
        memset(freearr, 0, sizeof(freearr));
    }

    ~StringMap()
    {
        for (int i = 0; i < freen; i++)
            delete[](freearr[i]);
    }

    // A major cost of insert functions is calling malloc to create each node.
    // This function uses the ancient C technique of allocating a buffer of nodes and
    // dealing them out as needed. Profiling shows that this eliminates most of the
    // time spent in storage allocation. 

    void insert(char const* s, const Data& data)
    {
        char const* instr = s;
        Tnode* pp;
        Tnode** p;
        p = &root;
        while (pp = *p)
        {
            // optimization: save a difference in a comparison
            int d;
            if ((d = *s - pp->splitchar) == 0)
            {
                if (*s++ == 0)
                    return;
                p = &(pp->eqkid);
            }
            else if (d < 0)
                p = &(pp->lokid);
            else
                p = &(pp->hikid);
        }
        for (;;)
        {
            // *p = (Tptr) malloc(sizeof(Tnode));    reference code; this is what the pools replace
            if (bufn– == 0)
            {
                if (freen == maxPools - 1)
                {
                    STRINGMAP_ERROR
                }
                buf = new Tnode[buffPoolGrowSize];
                freearr[freen++] = buf;
                bufn = buffPoolGrowSize - 1;
            }
            *p = buf++;
            pp = *p;
            pp->splitchar = *s;
            pp->lokid = pp->eqkid = pp->hikid = 0;
            if (*s++ == 0)
            {
                // store a pointer to every string in the tree; this data will be used by later search
                // exploit the fact that a node with a null splitchar cannot have an eqkid
                pp->eqkid = (Tnode*) instr;
                pp->data = data;
                ++size;    // count it
                return;
            }
            p = &(pp->eqkid);
        }
    }

    // If the search character is less, go to lokid;
    // if it is greater, go to hikid;
    // if it is equal, go to the next character and eqkid. 

    bool find(char const* s, Data* data) const
    {
        Tnode* p;
        p = root;
        while (p)
        {
            if (*s < p->splitchar)
                p = p->lokid;
            else if (*s == p->splitchar)
            {
                if (*s++ == 0)
                {
                    *data = p->data;
                    return true;
                }

                p = p->eqkid;
            }
            else
                p = p->hikid;
        }
        return false;
    }

    void inOrderR(std::vector<std::pair<char const*, Data*> >& nodes, Tnode* p) const
    {
        if (!p)
            return;

        inOrderR(nodes, p->lokid);

        if (p->splitchar)
            inOrderR(nodes, p->eqkid);
        else
            nodes.push_back(std::pair<char const*, Data*>((char const*)p->eqkid, &p->data));

        inOrderR(nodes, p->hikid);
    }

    void inOrder(std::vector<std::pair<char const*, Data*> >& nodes) const
    {
        inOrderR(nodes, root);
    }

    // print all members of the tree in order

    void traverse(Tnode* p)
    {
        if (!p)
            return;
        traverse(p->lokid);
        if (p->splitchar)
            traverse(p->eqkid);
        else
            printf(“%s\n”, (char *) p->eqkid);
        traverse(p->hikid);
    }

    Tnode* buf;                    // next available Tnode
    int bufn;                    // number of Tnodes available in current pool
    int freen;                    // number of pools to be freed
    Tnode* freearr[maxPools];    // all the allocated pools
    int size;                    // number of nodes

    Tnode* root;                // root of the tree
};

#endif

Why roll your own RTTI?

Posted in Code at 3:46 pm by admin

I’ve received a few inquiries as to why C++ RTTI is not desirable in high performance applications. Better than anecdotal information can be found in an extremely interesting piece for an ISO technical committee by Lois Goldthwaite - a Technical Report on C++ Performance. Roughly half of this report is devoted to a proposal for hardware addressing methods for C++, but the other half is immediately practical. The answer to “why roll your own” is buried near the end of this article.

To quote the document, its purpose is

  • to give the reader a model of time and space overheads implied by use of
    various C++ language and library features,
  • to debunk widespread myths about performance problems,
  • to present techniques for use of C++ in applications where performance
    matters, and
  • to present techniques for implementing C++ Standard language and library
    facilities to yield efficient code.

Knowledge of all of the above should be an arrow in every C++ programmer’s quiver!

The first rich vein in this document begins in Section 5.3, Classes and Inheritance. The overhead of various call methodologies is analyzed. My take on the severity of the differences between the approaches is greater than that of the report’s author, but the raw data is very helpful to understanding the issues. Unfortunately the identity of Mystery Compilers one through five is masked.

Section 5.3.5 on multiple inheritance method invocation shows the overhead is much larger than I would have expected. I am amused by the author’s note that 25% overhead is minor, but I take the point that one can code so as not to take that branch so often.

Section 5.3.7 is the first shocker. The typeid operator is very slow, but comments earlier in the report that type_info is recorded as strings, and my own profiling and disassembly showing that dynamic_cast and typeid is dominated by strcmp calls jive with these results. I suppose I shouldn’t have been shocked, but I was still on the fence as to whether I was somehow misreading what I was seeing.

Next, in Section 5.3.8, we see that dynamic_cast performance is perfectly acceptable only in the case of up casting. Down casting is dreadful, and cross casting (casting between branches in a multiple inheritance hierarchy) is an outrage. The report’s author suggests that compiler optimizer writers could perhaps pull up their socks and maybe do a bit better.

Section 5.4.1 reveals exception handling to suffer from the same drawbacks as dynamic casting. The primary reason is to deal with exceptions thrown in constructors and destructors; complex type information and the current state of construction must be maintained so that if an exception is thrown, all relevant destructors can be invoked.

Section 5.4.2.2 discusses exception specifications. The basic conclusion is that exception specifications are very heavy because redundant work is done for every exception thrown because the exception must be rethrown after type checking. Only whole program analysis could turn this into a compile-time check, and I can conclude that it is therefore impractical for large projects. This section also shows that an empty throw specification should greatly speed execution by telling the compiler that no type information needs to be baked into the execution context. I haven’t checked in VS2008, but up to VS2005, the empty throw specification is ignored. gcc does not warn if an empty throw specification exists, but I don’t know if it implements the relevant optimizations.

Section 5.6 is a large section on practical remedies a programmer can undertake to make code performant. There is a lot of common sense in this section, and some less worthy homilies. I’ll leave it to you to make your own judgements on the advice. All of the advice here is certainly not wrong, and is worth considering.

Section 5.6.7 is worth special note. I found three useful points here.

  1. If your code is built with exception handling, std::string will be slower than you would expect for the reasons covered in 5.4.1.
  2. The implementation of list::size() is often order n, resulting in quadratic complexity in a loop, suggesting the use of list.empty() instead of list.size() == 0.
  3. Iostreams are hog-tied by synchronization with C streams. This can be disabled by calling std::ios_base::sync_with_stdio(false); and std::cin.tie(0).

Section 6 reveals that the inefficiency of IOStreams is due to std::locale. A great many pages are devoted to how locales might be implemented such that they don’t inherently suck. We can read this two ways, one, as a well reasoned beg to std library implementors to do something better, and two, as a note to ourselves to only use unadorned IOStreams for ASCII I/O. Otherwise, we can certainly use IOStreams, but we must take care to use them well. On Windows, we must set the mode to binary by specifying the ios::binary facet: ifstream stream(filename, ios::binary); Cross platform, we should bypass the locale layer entirely as this is where the piles and piles of inefficiency comes from (setting stream flags, checking options, locale, character set, and so on), and go straight for the data reads and writes using the rdbuf - ifs.rdbuf()->sgetn()/ifs.rdbuf()->sputn(). I refer you to this gamedev thread for more information. Another IOStreams inefficiency under MSVC is that IOStreams goes to FILE* operations instead of Windows native functionality. Alas! Nonetheless, bypassing the locale and character set gets us close to where we need to be.

I wrote a small test program to verify the advice of going to rdbuf. The fopen version of the program ran at roughly the same speed as the ifstream version, although for whatever reason the variability was much higher (generally running at almost the same speed, occasionally a bit faster, and occasionally at half the speed - I can only assume that the half speed sample was a hiccough due to my system doing something else during the test run). I did a similar sample for writing and found the performance to be comparable through rdbuf. I’m including the reading sample program here to save you hunting for cryptic syntax and keywords (the writing code is trivially easy). It might just be me, but compared to fopen/fread this is alphabet soup! In order to get with the program though, I’m willing to give it a shot. I had to define _SCL_SECURE_NO_WARNINGS and _CRT_SECURE_NO_WARNINGS in the preprocessor definitions to suppress the compiler telling me that it’s not safe for me to read data because maybe data isn’t large enough to hold size bytes. Sure, thanks, I’m fine juggling this particular chainsaw.

#ifdef _MSC_VER
    #define IOS_BINARY std::ios::binary
#else
    #define IOS_BINARY 0
#endif
std::ifstream i;
i.open(“d:\\foo.bin”, IOS_BINARY);
std::filebuf *pbuf = f.rdbuf();
size_t size = pbuf->pubseekoff(0, std::ios::end, std::ios::in);
pbuf->pubseekpos(0, std::ios::in);
void* data = malloc(size);
pbuf->sgetn((char*) data, size);
i.close();

Section 7 is interesting. I do contest the idea in 7.2.2.1 that typeid is a useful alternative to dynamic_cast for determining type compatibility. My experiments with roll your own RTTI so far show the == operator to be excruciatingly slow. This could be an MSVC thing, but I kind of doubt it. strcmp in a profile (which is what you get from this operator) is never a sign of your program behaving in a nice way, and I’m sure it’s going to show up in gcc based compiles as well. Of course, this is what led me to create the lightweight type system in this series of articles in the first place.

Section 7.2.2.3 on exceptions is worth reading and underlining, the logic here is better than the dogma you’ll generally find in discussions on this topic.

Skipping a lot of pages brings us to Appendix D, where the timing code is available.

Appendix E has a large and worthwhile bibliography.

06.27.08

In the garden II

Posted in Art, Science at 1:29 am by admin

Sequel to In the Garden. This time it’s all about some of the indoor plants, all these are carnivorous and trap prey either by pitfall, sticky tentacles, or gummy, gooey, leaves.

 

The giant Nepenthes I got last summer is even more giant now. The plant has doubled in size to over 130cm (4.5 ft). Here is the first new trap. It turns out that this plant is likely Nepenthes Miranda, a hybrid of obscure origin, but recently popular as someone has figured out how to efficiently propagate them. You can see part of my Natural History museum/Wunderkammer in the background - my wife’s sculpture of a sea dragon, a sustainably collected butterfly, an AT-AT, and a first generation plaster cast of Archaeopterix Lithographica, pulled by a paleontologist friend from the original in Berlin.

 

 

Part of the jungle.

 

 

These are miniature Drosera, the spindly ones in the back are Scorpiades.

 

 

The Pinguicula farm. If you have a look at the earlier post, that large “octopus plant” has multiplied into ten plants filling the huge pot in the foreground. The plants in the background just got separated out from the pot in the white saucer on the right. How did they all cram in there? I am amazed at how shallow the roots are; these guys just sit on top of the soil more or less. In the terrarium you can see a ping pot my wife made for me with a nice ping sculpture on the side.

The bright blue ball is a polished fused ball of glass fibers.

« Previous entries

Bad Behavior has blocked 480 access attempts in the last 7 days.