Posts Tagged ‘grace’

On Snobism

Thursday, July 10th, 2008

Recently I decided to bring some life to the Grace homepage. I always expected it to spawn some controversy so I’ve not been surprised by seeing a vocal group of people dismissing its ideas out of hand. One of the most colorful reviews is one that could only make me chuckle: “it’s like Java and PHP gang-raped a Makefile”. I’m not likely to make too much out of reactions like that, these are questions about taste where you just can’t please everyone (and shouldn’t try).

Another reason why I don’t worry about art critics in this context is that, even if Grace were a library with only a single user, it would still help me get my stuff done in a way that I enjoy. It has already proven its value and given me an excellent return on investment using it in a lot of roles. It scratched my itch and that is fine.

I do find some of the negativism on sites like reddit interesting as a phenomenon. I’ve thought of this as a factor of the functional programming cartel that seems to be hanging out at such places. And I realized that this way of looking at it is just as dismissive and childish, so it made me wonder how it can be that there seems to be this great wall between that crowd and the typical ISP/Unix nerd demographic I normally interact with.

When looking at software, I reckon there are two approaches. Some people, when they think about code, see a world of math. For me, what I mostly see is flying bits, an interconnected lego-world of action-reaction patterns that makes intuitive sense. I think both approaches are valid, but obviously they lead to a completely different view on software development.

Naturally I may be suffering from confirmation bias on this subject, but I get the impression that those of us that are more into this direct approach to programming are the ones producing most of the real, living software out there. I can definitely see areas where a more distanced and abstract approach like functional programming can make a difference, but a lot of software development is really about moving bits from A to B, more about rolling up your sleeves and building it than mulling about algorithms and monads; lego, not math.

Using Protocols in C++

Sunday, March 16th, 2008

A major source of anger and frustration in C++ style OO is multiple inheritance. It’s a source of anger and frustration and one most people recognize as a path best avoided. The Objective-C idea of a protocol is a real life-saver in many occasions where you would need to deal with multiple inheritance otherwise. The concept of a protocol is to have a second way of classifying objects, one that completely sidesteps the hierarchical class model and instead just classifies objects by their common functions.

One area where the concept of protocols is quite convenient is iteration. There is usually a wide area of possible classes that could, theoretically, enumerate a list of contained items. Although C++ lacks protocols proper, template iterators are a fine way to access any class that implements a de facto iterator protocol. The only thing missing is an explicit declaration of the implemented protocol.

The Grace iterator<collectionclass,nodeclass> template is a minimized version of the visitor<> template. It requires one function (visitchild) to be defined inside your class that returns a pointer to a child node. Here is what it would look like for a purely synthetic class:


class syntheticlist
{
public:
   syntheticlist (void) {}
   ~syntheticlist (void) {}
   string *visitchild (int atpos)
   {
       if ((atpos<0)||(atpos>1)) return NULL;
       if (atpos==0) str = “Hello”;
       else str = “world”;
       return &str;
   }
protected:
   string str;
};

int myApp::main (void)
{
   syntheticlist L;
   foreach (n,L) fout.writeln (n);
   return 0;
}

The foreach macro goes through the following steps:

  1. It creates an iterator<syntheticlist, string *> pointing to L
  2. It sets up a for loop that starts at visitchild(0) and stops when it returns NULL
  3. It creates a temporary reference variable n linked to the current node.

The Grace iterator protocol looks a bit ascetic with only visitchild and the lack of the traditional first() and next() pattern. Part of this is due to the visitor protocol (which iterator is a part of) being about more than just of iteration. If your goal, however, is to make your class iterable with foreach, you can safely assume an argument value 0 to mean ‘first’ and any value bigger than 0 to mean ‘next’ if that makes more sense in your context. Most collection classes used within Grace use growable arrays, which means they don’t have to keep any state information in order to be iterable.

Garbage Collection is Overrated

Saturday, March 15th, 2008

Next to string and array handling, memory management has always been one of those reasons users of other languages pointed at C++ programmers and laughed. The problem stems from dealing with temporary objects and keeping track of their lifetime. Although remembering to delete objects after you’re done with them sounds easy on paper, in the real world it’s tedious and you’re always just one forgetful early return statement away from a memory leak.

A lot of modern languages offer a friendlier form of memory management for this reason. They keep track of all references to a particular object and regularly take care of objects that are no longer referenced. This garbage collection process has a number of drawbacks that are usually seen as justified, considering the programming problems it solves. And it does solve them, but not all problems it solves need to be problems and, once you take those non-problems off the list, the smell of over-engineering becomes apparent.

What I see as the most interesting memory management problem that garbage collection solves can be illustrated by the following snippet:

Storpel *createStorpel (int modelNumber)
{
   Storpel *res = new Storpel;
   res->setModel (modelNumber);
   return res;
}

int main (void)
{
   Storpel *myStorpel = createStorpel (42);
   myStorpel->grind (11);
   delete myStorpel;
   return 0;
}

The problem with this pattern is obvious: The receiving function, by calling createStorpel() has taken responsibility for managing the Storpel object. This can get hairy in almost zero time.

In Grace, I prevented this anti-pattern from emerging by sticking to a number of principles:

  • Use of pointers and new for objects is minimalized (pointers are awkward when you want to use overloaded operators anyway, so there’s exterior motivation).
  • Data container classes are designed to be mere vessels for a raw data block that can be taken away from one object and ‘given’ to another without copying.
  • When confronted with a pointer during assignment and initialization, data container classes will ‘take over’ the other object’s data and immediately delete it.
  • Functions that want to return objects actually return pointers to temporary objects. The returnclass (type) name retain macro creates such a temporary object plus a temporary reference to make it easier to access the object.

Words are words, code is code. This is what it effectively looks like:

string *getGreeting (void)
{
   returnclass (string) res retain;
   res = “Hello, world.”;
   return &res;
}

int main (void)
{
   string s = getGreeting();
   fout.writeln (s);
   return 0;
}

Behind the curtains, the returnclass macro creates a temporary string object using a custom allocator. This string object creates a persistent refblock structure to hold the data. Then, inside main(), we assign it to s:

1: assignment

The receiving object takes a reference to the underlying refblock:
strcpy_stage2.png

And then the original object is immediately destroyed:
strcpy_stage3.png

When main() goes out of scope, the ’s’ string object will be automatically cleaned up and, with it, the refblock. No need for an explicit delete.

As for the other problems that garbage collection solves, most of them have to do with interactions between collections of more complex objects. Mostly, if you find yourself in need of tracking such complex relationships, you may need to look at a way of simplifying your design. If that is not possible, then keeping track of memory references takes no more diligence than what the structure already demands.

Strings in Grace

Friday, March 14th, 2008

The lack of a ‘native’ string type is one of the major gripes people have had with C and C++. I’ve basically grown up on C and I’ve walked that line. At the same time, when I looked at the string abstractions as they were implemented in class libraries and other languages, I realized that some things that were easy to do if you treated strings as arrays of characters in the way C did, were harder to accomplish if you treated them as types. To wit:

const char *hwld = “Hello, world.”;

/* Insanely quick copy of a substring */
const char *wld = hwld + 7;

The C approach — having this mutable array as a working area — makes it relatively easy to hack and split strings into smaller parts and work with them as first class strings that can be used as arguments for other functions. All without ever copying a byte to a new object.

The Grace string class has grown a lot of nice features that make it easier to forget the feeling of loss that accompanies the sudden inability to cut up strings by spraypainting them with NUL bytes. With methods like string::left, string::mid, string::cutat and the strutil::split family of functions, a lot of splicing joy can be had for all.

The storage behind a string object uses a reference count and copy-on-write to deal with assignments and mutations. Assignments can be a real pain in the context of strings, which is why a number of languages recognize the concept of immutable strings. These languages make you go on string building expeditions to dynamically compose new strings, but they do this to allow for assignment without copying; if both the sending and the receiving object guarantee not to alter the data, it is safe to let them point to the same memory location. Copy-on-write satisfies this same ‘zero copy’ approach to assignment, but allows strings to remain mutable. The cost of copying is at best prevented, at worst delayed until the moment of mutation.

With all the hacking, cutting and copying kung-fu under its belt, Grace strings made me stop missing the C array approach, except for one thing: Each time you split up a string god killed a kitten data got copied. This week, the string class got a major overhaul: Next to the pointer to the copy-on-write back-end memory that each individual string carries, it now keeps track of an offset. So now, when you do this:

string s1 = “Hello, world.”
string s2 = s1.cutat (’ ‘);

what actually happens is this:

That means even less copying. Some unexpected things now trigger a copy-on-write, though. The most sad one is conversion to a C-compatible string. Since C expects a NUL character, a sub-string will have to mutate, triggering a copy-on-write.

Programming and Exploration

Thursday, March 6th, 2008

Although it was never a conscious design goal, during its evolution a wholesome part of Grace proved to actively make C++ look more like Python. I wasn’t even really aware of Python until about two years into the project, so this fact both surprised and inspired me. It surprised me, because I didn’t expect similar efforts towards code legibility to spring out of the corner of a system scripting language — obviously a wrong way to look at things, but after seeing Tcl, Perl and PHP at work my cynical heart had grown weary. It is inspiring to know that there are people out there with similar ideas and vision and that their ideas are capable of growing a following. Ever since, I’ve been using Python as a reflective surface when evaluating potential new constructs in Grace.

I’ve never been envious of Python. I’m rational enough to just forget about it all and join the dark side if that’s ever going to happen. I was, however, particularily fond of the Python command line interpreter. Of course this idea has been around since the invention of, well, the shell, but one thing it accomplishes is that it allows you to quickly try out the effects of different commands and constructs without opening a text editor. This is a place where compiled languages tend to suck.

This week, Peter lamented the absence of this feature as well. He mumbled something about ‘just feed it through gcc’ and probably expected that to be the end of it. So last night I built a Grace shell. It uses some awfully tacky hacks to do its business, but I didn’t even have enough time to accumulate my distaste for that once I started noticing how completely addictive working with it is. It takes away a lot of anxiety that would normally have you running for the documentation by letting you just explore what happens when you call an API function.

Here’s the shell in action, working out the value class:

>>> value p
>>> p = $(”name”,”John Doe”)->$(”mail”,”j@doe.org”)
>>> p["url"] = “http://doe.org/~j/”
>>> print (p.toxml())
<?xml version=”1.0″ encoding=”UTF-8″?>
<dict id=”person”>
    <string id=”name”>John Doe</string>
    <string id=”mail”>j@doe.org</string>
    <string id=”url”>http://doe.org/~j/</string>
</dict>
 
>>> foreach (f,p){
  >   print (”%10s : %s” %format (f.id(),f))
  > }
name       : John Doe
mail       : j@doe.org
url        : http://doe.org/~j/

The shell already exposed a number of bugs that would’ve taken much longer to get themselves caught otherwise. People with access to our mercurial repo can find gracesh in /hg/.

About that compact web server

Sunday, February 17th, 2008

It’s funny to notice how, subconsciously, the way things work in Grace turn out to look eerily similar to how they work in Python. I was reminded of that fact again when I read this article, illustrating a 15 line web server implementation in Python. This is how it would look in Grace:


#include <grace/daemon.h>
#include <grace/httpd.h>


class myApp : public daemon
{
public:
   myApp (void) : daemon (”myapp”) {}
   ~myApp (void) {}
   int main (void);
};


class helloPage : public serverpage
{
public:
   helloPage (httpd &p) : serverpage (p, “/foo”) {}
   ~helloPage (void) {}
   
   int execute (value &env, value &argv,
                string &out, value &outhdr)
   {
       if (! argv.exists (”target”))
           argv["target"] = “world”;


       out = “Hello, %[target]s\n” %format (argv);
       return 200;
   }
};


$application (myApp);
$version (1.0);


int myApp::main (void)
{
   httpd srv (8180);
   value event;
   new helloPage (srv);
   daemonize ();
   srv.start ();
   while (event.type() != “shutdown”)
       event = waitevent ();
   httpd.shutdown ();
   return 0;
}

Admittedly this is a bit more than 15 lines (even if we don’t count lines with curly brackets). We do get a little bit more oomph for those lines, though:

  • The application checks a pid-file in /var/run or ${HOME}/var/run.
  • The application is forked to the background
  • The application responds sanely to SIGTERM

Having an easily extendible web server built into your framework turns out to be pretty convenient: It’s a very quick way to deploy a remotely accessible service.

Inline Tree Declarations in Grace

Tuesday, February 12th, 2008

The Grace value class is a pretty useful container for tree-data. It walks and quacks like an associative array that, combined with the foreach macro makes it easy to inspect and manipulate tree structures using compact code like this:


void printListOfPersons (const value &persons)
{
 foreach (p, persons)
 {
   fout.writeln (”%s <%s>” %format (p["name"], p["email"]));
 }
}

Getting data into such a structure posed to be a bit of a challenge, though. You ended up with huge blocks of code with lots of repetitions:

persons["john"]["name"] = “John Smith”;
persons["john"]["email"] = “jsmith@example.net”;
persons["steve"]["name"] = “Steve McSmith”;
persons["steve"]["email"] = “steve@example.net”;

The repetitive nature wasn’t only straining on the hands, the build-up here also involved looking up the node for ‘john’ and ’steve’ twice. Using the negative array index to refer to the last node was a nice timesaver in this respect:

persons["john"]["name"] = “John Smith”;
persons[-1]["email"] = “jsmith@example.net”;

The problem with this approach, apart from it still being pretty repetitive, is that it starts you thinking that the idiom persons[-1] means “the node I created in the line before”, but there are lots of situations where you cannot really know this. Take this example:

persons = listPersons();
persons["john"]["favoriteColor"] = “blue”;
persons[-1]["favoriteSwallowType"] = “African”;

All it takes for this code to break is the inclusion of the key ‘john’ in the result set of listPersons() at a position that is not the last one in the array. This may sound like a mistake that is easy to avoid, but in an application that sees a bit of growth, the assumptions you make in this respect may stop being valid in the future, and you’ll have forgotten all about them.

The solution to this problem came to me as an inspiration from jQuery’s clever discovery of this one symbol in the C-like namespace that managed to remain completely overlooked by most library builders: The dollar sign. On the global level, two variations of the function $(…) are defined that create a retainable pointer to a value object:

  • $(”value”) creates an array and adds a node with “value” as its value data.
  • $(”key”,”value”) creates a dictionary and adds a node with the key “key” and “value” as its value data.

These functions create a pointer to a value object, so the library also defines value::$(…). Now you can chain nodes together, like this:

value v = $(”john”,
             $(”name”, “John Doe”) ->
             $(”email”, “jdoe@example.net”) ->
             $(”favoriteColors”,
                 $(”Titanium White”) ->
                 $(”Cadmium Yellow”)
              ) ->
             $(”favoriteNumber”, 42)
          ) ->
         $(”steve”,
             $(”name”, “Steve Jibs”) ->
             $(”email”, “sjibs@example.net”) ->
             $(”favoriteColors”, $(”Black”)) ->
             $(”favoriteNumber”, 1)
          );

The result is pretty compact on a scale from 1 to JSON. There are some extra functions that will come of use:

  • $attr(key,value) Sets an attribute.
  • $type(type) Sets the value’s type().
  • $val(otherval) Sets the value’s data only to that of otherval.
  • $merge(otherval) Merges the child nodes and attributes of another value.

After these functions were added to grace, I started using them religiously. Inline declarations are a powerful way to keep your code readable. Thanks, jQuery.

The Seeming Paradox of the Popularity of XML

Sunday, February 10th, 2008

The growth of XML as a data exchange standard is often seen as a positive development in terms of the chaos it replaced. The relationship the world has with XML, however, is not built on pure love. It’s interesting to know a bit about the problems people have with the standard, especially if we can grasp why, if it has all these problems, XML is still proving itself to be so pervasive.

1. It’s hard to parse

XML data structures can destroy your soul by just looking at them. Now, if this were just about our meatware being wholly inadequate anyway but, at least, would’ve allowed programs to have an easy time parsing these beasts, that would have been just fine. LISPy S-expressions can be parsed by an amoeba. The spec for JSON fits in your browser without scrolling for crying out loud. But what do you know, XML syntax can be quite quirky, what with namespaces, escaping and the hundreds of ways to get things wrong.

2. Seriously, it’s insanely hard to parse

Any sufficiently large data structure will look intimidating. People do insane things with XML. Mindnumbingly insane things. It’s like the ‘porn rule’: Think of the stupidest way you can imagine to encode a data structure into XML, used to encode the stupidest data structure you dare to dream of, multiply that by two and then realilze that someone is doing that right now.

So why isn’t everybody using your favorite serialization standard?

Since we’ve already established how XML is hard to parse, how come people aren’t flocking to alternatives? Why won’t XML just die? Why, why? I think the answer is attributes. Your favorite serialization standard does not have them:

  • JSON - nope
  • S-expressions - fuggeddit
  • Serialized PHP - you’re joking, right?
  • NextSTEP plists - nuh-uh
  • YAML - no matches for ‘attribute’, plus its complicated rules end up making the standard even more insane than XML.

So why do these standards not support attributes, you may ask? Surely if we take one of these and add attributes, we would end up with a superior standard? This is where reality will bite. I’ve been toying with these ideas and my preliminary conclusion is that adding attributes to an encoding standard will make it, you guessed: harder to parse!

Who needs attributes anyway?

In the spirit of KISS and in defense of one of the standards that is currently easier to parse, people have been into various forms of denial about attributes. The unescapable fact is that the concept of attributes is tremendously useful. It allows for distinguishing two kinds of properties that can be attached to a single node in a data tree: Data attributes that say something about the node and child nodes that the node owns.An illustration of this distinction inside a ‘tree only’ object model can only be accomplished through cumbersome solutions:

  • Using a magic prefix, like object['__myattribute'] versus object['somechild']. Oh, and you just ruined your namespace.
  • Making a second step in the hierarchy, so you do object['attr']['myattribute'] and object['children']['somechild']. You’ll clearly enjoy those heroic ventures into /object/children/child1/children/child2/… and you will run into a need for that.
  • Some hybrid form, like using object['__attr__']['myattribute']. Not too shabby, but you still lost the ability for a node to have direct data alongside attributes, so you will have to access that through object['__data__'] or some such. You also polluted your namespace again, just not as badly as before.

Since there are data models that only XML can express while XML can easily express the models of other encodings, the standard is not likely to be going anywhere any time soon.

XML and OpenPanel

With that in mind, we still picked JSON for OpenPanel’s RPC because it was assumed to be faster to parse and easier to map to javascript options on the GUI side. We dealt with a PHP middle layer during earlier builds that we fed serialized PHP arrays, so we already had to stick to the lowest common denominator with regards to data modeling the RPC structures. With the PHP layer gone and the CLI not giving a hoot about what format is spoken, JSON sounded like the best idea. Adding an XML mapping to this will not be a problem.

On the side of modules, we’ve always been heavy on using XML structures to define the make-up and layout of the object tree inside the opencore database. Lately, this has been bugging me for a bit. The module.xml format has mostly organically grown around the changes inside opencore’s object database and module interaction API as they came about. It’s not the most friendly of XML files for developers to edit, though, and module programming is something we want to encourage as much as possible. So I came up with a text format that allows for the mixture of data, attributes that is typically needed but is easy to parse, for humans and machines alike.

Here’s the syntax:

# simple tree
Person john
   string name
   dict address
       string email
       string msn
       string homepage

In Pythonesque fashion, we parse parent/child structuring through the indent level. Let’s add some data:

# tree with values
Person john
   string name          : John T. Ripper
   dict address
       string email     : johnt@ripper.co.uk
       string msn       : johnt152@hotmail.com
       string homepage  : http://www.ripper.co.uk/~johnt

And, finally, let’s add those dreaded attributes:

# tree with values and attributes
Person john             < recordupdated true
                        < recordsaved false
   string name          : John T. Ripper
   dict address
       string email     : johnt@ripper.co.uk
                        < hide true
                        < sendmarketing false
       string msn       : johnt152@hotmail.com   < hide true
       string homepage  : http://www.ripper.co.uk/~johnt

That’s all. It’s too limited to ever replace XML, but it matches the subject layout of our module.xml perfectly, so it can be used to generate the XML files for people who like it, but sticking to directly editing module.xml is just fine. It’s more an experiment in purpose-driven textualization of a data model than it is trying to be what m4 became for sendmail.cf.

For a comparison view: module.xml versus module.def.

String Formatting in Grace

Thursday, September 13th, 2007

There are many places where programs written using grace seem to be syntactically closer to languages like Python than to C. String formatting used to be not one of these places. Historically, formatting text with dynamic elements within grace was a matter of using either string::printf() or file::printf(). These are C-style printf formatters, which means they accept a variable argument list consisting of a ‘format string’ and one or more integer or pointer arguments. When dealing with value and string objects, this means liberally using explicit casts like cval() and ival() to convert those to printf-compatible types.

The library version inside the repository now supports a friendlier way of going about things, one that does away with all the explicit casting in the form of the %format pseudo-keyword. In its most simple incarnation, it works mostly the same as a normal printf formatting operation, except that it can deal with any objects that can be cast to value:

  string test = “%i bottles of %s” %format (99, “beer”);

One level up on the funky scale is referring to positioned arguments:

  fout.writeln (”<%s>%s</%{0}s>” %format (”str”,”foo”));

Finally, you can access children of the first argument like this:

  value v;
  v["name"] = “John Smith”;
  v["email"] = “j.smith@example.net”;
 
  fout.writeln (”%[name]s <%[email]s>” %format (v));

This way of formatting, apart from being more flexible and less sensitive to formatting-related security problems (since it doesn’t need to follow arbitrary pointers), adds a lot of clarity to your source code, which is a major plus for keeping up maintainable code.