Archive for March, 2008

See OpenPanel Run. Run OpenPanel, Run.

Friday, March 21st, 2008

Here’s a preview of the upcoming Beta release of our life’s work. Now that we’re over that “oh no, we’re all going to die and we’ll never finish it” point, things are starting to look well. Or perhaps the other way around:


We’ll pick up the last bugs we found monday and then seed a release candidate to our alpha team. If no show stoppers emerge, that will be followed up shortly by a full public release of the OpenPanel beta.

Sunday Night Dancing Bear Blogging

Sunday, March 16th, 2008

Using Protocols in C++

Sunday, March 16th, 2008

A major source of anger and frustration in C++ style OO is multiple inheritance. It’s a source of anger and frustration and one most people recognize as a path best avoided. The Objective-C idea of a protocol is a real life-saver in many occasions where you would need to deal with multiple inheritance otherwise. The concept of a protocol is to have a second way of classifying objects, one that completely sidesteps the hierarchical class model and instead just classifies objects by their common functions.

One area where the concept of protocols is quite convenient is iteration. There is usually a wide area of possible classes that could, theoretically, enumerate a list of contained items. Although C++ lacks protocols proper, template iterators are a fine way to access any class that implements a de facto iterator protocol. The only thing missing is an explicit declaration of the implemented protocol.

The Grace iterator<collectionclass,nodeclass> template is a minimized version of the visitor<> template. It requires one function (visitchild) to be defined inside your class that returns a pointer to a child node. Here is what it would look like for a purely synthetic class:


class syntheticlist
{
public:
   syntheticlist (void) {}
   ~syntheticlist (void) {}
   string *visitchild (int atpos)
   {
       if ((atpos<0)||(atpos>1)) return NULL;
       if (atpos==0) str = “Hello”;
       else str = “world”;
       return &str;
   }
protected:
   string str;
};

int myApp::main (void)
{
   syntheticlist L;
   foreach (n,L) fout.writeln (n);
   return 0;
}

The foreach macro goes through the following steps:

  1. It creates an iterator<syntheticlist, string *> pointing to L
  2. It sets up a for loop that starts at visitchild(0) and stops when it returns NULL
  3. It creates a temporary reference variable n linked to the current node.

The Grace iterator protocol looks a bit ascetic with only visitchild and the lack of the traditional first() and next() pattern. Part of this is due to the visitor protocol (which iterator is a part of) being about more than just of iteration. If your goal, however, is to make your class iterable with foreach, you can safely assume an argument value 0 to mean ‘first’ and any value bigger than 0 to mean ‘next’ if that makes more sense in your context. Most collection classes used within Grace use growable arrays, which means they don’t have to keep any state information in order to be iterable.

Garbage Collection is Overrated

Saturday, March 15th, 2008

Next to string and array handling, memory management has always been one of those reasons users of other languages pointed at C++ programmers and laughed. The problem stems from dealing with temporary objects and keeping track of their lifetime. Although remembering to delete objects after you’re done with them sounds easy on paper, in the real world it’s tedious and you’re always just one forgetful early return statement away from a memory leak.

A lot of modern languages offer a friendlier form of memory management for this reason. They keep track of all references to a particular object and regularly take care of objects that are no longer referenced. This garbage collection process has a number of drawbacks that are usually seen as justified, considering the programming problems it solves. And it does solve them, but not all problems it solves need to be problems and, once you take those non-problems off the list, the smell of over-engineering becomes apparent.

What I see as the most interesting memory management problem that garbage collection solves can be illustrated by the following snippet:

Storpel *createStorpel (int modelNumber)
{
   Storpel *res = new Storpel;
   res->setModel (modelNumber);
   return res;
}

int main (void)
{
   Storpel *myStorpel = createStorpel (42);
   myStorpel->grind (11);
   delete myStorpel;
   return 0;
}

The problem with this pattern is obvious: The receiving function, by calling createStorpel() has taken responsibility for managing the Storpel object. This can get hairy in almost zero time.

In Grace, I prevented this anti-pattern from emerging by sticking to a number of principles:

  • Use of pointers and new for objects is minimalized (pointers are awkward when you want to use overloaded operators anyway, so there’s exterior motivation).
  • Data container classes are designed to be mere vessels for a raw data block that can be taken away from one object and ‘given’ to another without copying.
  • When confronted with a pointer during assignment and initialization, data container classes will ‘take over’ the other object’s data and immediately delete it.
  • Functions that want to return objects actually return pointers to temporary objects. The returnclass (type) name retain macro creates such a temporary object plus a temporary reference to make it easier to access the object.

Words are words, code is code. This is what it effectively looks like:

string *getGreeting (void)
{
   returnclass (string) res retain;
   res = “Hello, world.”;
   return &res;
}

int main (void)
{
   string s = getGreeting();
   fout.writeln (s);
   return 0;
}

Behind the curtains, the returnclass macro creates a temporary string object using a custom allocator. This string object creates a persistent refblock structure to hold the data. Then, inside main(), we assign it to s:

1: assignment

The receiving object takes a reference to the underlying refblock:
strcpy_stage2.png

And then the original object is immediately destroyed:
strcpy_stage3.png

When main() goes out of scope, the ’s’ string object will be automatically cleaned up and, with it, the refblock. No need for an explicit delete.

As for the other problems that garbage collection solves, most of them have to do with interactions between collections of more complex objects. Mostly, if you find yourself in need of tracking such complex relationships, you may need to look at a way of simplifying your design. If that is not possible, then keeping track of memory references takes no more diligence than what the structure already demands.

Strings in Grace

Friday, March 14th, 2008

The lack of a ‘native’ string type is one of the major gripes people have had with C and C++. I’ve basically grown up on C and I’ve walked that line. At the same time, when I looked at the string abstractions as they were implemented in class libraries and other languages, I realized that some things that were easy to do if you treated strings as arrays of characters in the way C did, were harder to accomplish if you treated them as types. To wit:

const char *hwld = “Hello, world.”;

/* Insanely quick copy of a substring */
const char *wld = hwld + 7;

The C approach — having this mutable array as a working area — makes it relatively easy to hack and split strings into smaller parts and work with them as first class strings that can be used as arguments for other functions. All without ever copying a byte to a new object.

The Grace string class has grown a lot of nice features that make it easier to forget the feeling of loss that accompanies the sudden inability to cut up strings by spraypainting them with NUL bytes. With methods like string::left, string::mid, string::cutat and the strutil::split family of functions, a lot of splicing joy can be had for all.

The storage behind a string object uses a reference count and copy-on-write to deal with assignments and mutations. Assignments can be a real pain in the context of strings, which is why a number of languages recognize the concept of immutable strings. These languages make you go on string building expeditions to dynamically compose new strings, but they do this to allow for assignment without copying; if both the sending and the receiving object guarantee not to alter the data, it is safe to let them point to the same memory location. Copy-on-write satisfies this same ‘zero copy’ approach to assignment, but allows strings to remain mutable. The cost of copying is at best prevented, at worst delayed until the moment of mutation.

With all the hacking, cutting and copying kung-fu under its belt, Grace strings made me stop missing the C array approach, except for one thing: Each time you split up a string god killed a kitten data got copied. This week, the string class got a major overhaul: Next to the pointer to the copy-on-write back-end memory that each individual string carries, it now keeps track of an offset. So now, when you do this:

string s1 = “Hello, world.”
string s2 = s1.cutat (’ ‘);

what actually happens is this:

That means even less copying. Some unexpected things now trigger a copy-on-write, though. The most sad one is conversion to a C-compatible string. Since C expects a NUL character, a sub-string will have to mutate, triggering a copy-on-write.

Programming and Exploration

Thursday, March 6th, 2008

Although it was never a conscious design goal, during its evolution a wholesome part of Grace proved to actively make C++ look more like Python. I wasn’t even really aware of Python until about two years into the project, so this fact both surprised and inspired me. It surprised me, because I didn’t expect similar efforts towards code legibility to spring out of the corner of a system scripting language — obviously a wrong way to look at things, but after seeing Tcl, Perl and PHP at work my cynical heart had grown weary. It is inspiring to know that there are people out there with similar ideas and vision and that their ideas are capable of growing a following. Ever since, I’ve been using Python as a reflective surface when evaluating potential new constructs in Grace.

I’ve never been envious of Python. I’m rational enough to just forget about it all and join the dark side if that’s ever going to happen. I was, however, particularily fond of the Python command line interpreter. Of course this idea has been around since the invention of, well, the shell, but one thing it accomplishes is that it allows you to quickly try out the effects of different commands and constructs without opening a text editor. This is a place where compiled languages tend to suck.

This week, Peter lamented the absence of this feature as well. He mumbled something about ‘just feed it through gcc’ and probably expected that to be the end of it. So last night I built a Grace shell. It uses some awfully tacky hacks to do its business, but I didn’t even have enough time to accumulate my distaste for that once I started noticing how completely addictive working with it is. It takes away a lot of anxiety that would normally have you running for the documentation by letting you just explore what happens when you call an API function.

Here’s the shell in action, working out the value class:

>>> value p
>>> p = $(”name”,”John Doe”)->$(”mail”,”j@doe.org”)
>>> p["url"] = “http://doe.org/~j/”
>>> print (p.toxml())
<?xml version=”1.0″ encoding=”UTF-8″?>
<dict id=”person”>
    <string id=”name”>John Doe</string>
    <string id=”mail”>j@doe.org</string>
    <string id=”url”>http://doe.org/~j/</string>
</dict>
 
>>> foreach (f,p){
  >   print (”%10s : %s” %format (f.id(),f))
  > }
name       : John Doe
mail       : j@doe.org
url        : http://doe.org/~j/

The shell already exposed a number of bugs that would’ve taken much longer to get themselves caught otherwise. People with access to our mercurial repo can find gracesh in /hg/.