Archive for February, 2008

Saturday Night Dancing Bear Blogging

Sunday, February 17th, 2008

About that compact web server

Sunday, February 17th, 2008

It’s funny to notice how, subconsciously, the way things work in Grace turn out to look eerily similar to how they work in Python. I was reminded of that fact again when I read this article, illustrating a 15 line web server implementation in Python. This is how it would look in Grace:


#include <grace/daemon.h>
#include <grace/httpd.h>


class myApp : public daemon
{
public:
   myApp (void) : daemon (”myapp”) {}
   ~myApp (void) {}
   int main (void);
};


class helloPage : public serverpage
{
public:
   helloPage (httpd &p) : serverpage (p, “/foo”) {}
   ~helloPage (void) {}
   
   int execute (value &env, value &argv,
                string &out, value &outhdr)
   {
       if (! argv.exists (”target”))
           argv["target"] = “world”;


       out = “Hello, %[target]s\n” %format (argv);
       return 200;
   }
};


$application (myApp);
$version (1.0);


int myApp::main (void)
{
   httpd srv (8180);
   value event;
   new helloPage (srv);
   daemonize ();
   srv.start ();
   while (event.type() != “shutdown”)
       event = waitevent ();
   httpd.shutdown ();
   return 0;
}

Admittedly this is a bit more than 15 lines (even if we don’t count lines with curly brackets). We do get a little bit more oomph for those lines, though:

  • The application checks a pid-file in /var/run or ${HOME}/var/run.
  • The application is forked to the background
  • The application responds sanely to SIGTERM

Having an easily extendible web server built into your framework turns out to be pretty convenient: It’s a very quick way to deploy a remotely accessible service.

Inline Tree Declarations in Grace

Tuesday, February 12th, 2008

The Grace value class is a pretty useful container for tree-data. It walks and quacks like an associative array that, combined with the foreach macro makes it easy to inspect and manipulate tree structures using compact code like this:


void printListOfPersons (const value &persons)
{
 foreach (p, persons)
 {
   fout.writeln (”%s <%s>” %format (p["name"], p["email"]));
 }
}

Getting data into such a structure posed to be a bit of a challenge, though. You ended up with huge blocks of code with lots of repetitions:

persons["john"]["name"] = “John Smith”;
persons["john"]["email"] = “jsmith@example.net”;
persons["steve"]["name"] = “Steve McSmith”;
persons["steve"]["email"] = “steve@example.net”;

The repetitive nature wasn’t only straining on the hands, the build-up here also involved looking up the node for ‘john’ and ’steve’ twice. Using the negative array index to refer to the last node was a nice timesaver in this respect:

persons["john"]["name"] = “John Smith”;
persons[-1]["email"] = “jsmith@example.net”;

The problem with this approach, apart from it still being pretty repetitive, is that it starts you thinking that the idiom persons[-1] means “the node I created in the line before”, but there are lots of situations where you cannot really know this. Take this example:

persons = listPersons();
persons["john"]["favoriteColor"] = “blue”;
persons[-1]["favoriteSwallowType"] = “African”;

All it takes for this code to break is the inclusion of the key ‘john’ in the result set of listPersons() at a position that is not the last one in the array. This may sound like a mistake that is easy to avoid, but in an application that sees a bit of growth, the assumptions you make in this respect may stop being valid in the future, and you’ll have forgotten all about them.

The solution to this problem came to me as an inspiration from jQuery’s clever discovery of this one symbol in the C-like namespace that managed to remain completely overlooked by most library builders: The dollar sign. On the global level, two variations of the function $(…) are defined that create a retainable pointer to a value object:

  • $(”value”) creates an array and adds a node with “value” as its value data.
  • $(”key”,”value”) creates a dictionary and adds a node with the key “key” and “value” as its value data.

These functions create a pointer to a value object, so the library also defines value::$(…). Now you can chain nodes together, like this:

value v = $(”john”,
             $(”name”, “John Doe”) ->
             $(”email”, “jdoe@example.net”) ->
             $(”favoriteColors”,
                 $(”Titanium White”) ->
                 $(”Cadmium Yellow”)
              ) ->
             $(”favoriteNumber”, 42)
          ) ->
         $(”steve”,
             $(”name”, “Steve Jibs”) ->
             $(”email”, “sjibs@example.net”) ->
             $(”favoriteColors”, $(”Black”)) ->
             $(”favoriteNumber”, 1)
          );

The result is pretty compact on a scale from 1 to JSON. There are some extra functions that will come of use:

  • $attr(key,value) Sets an attribute.
  • $type(type) Sets the value’s type().
  • $val(otherval) Sets the value’s data only to that of otherval.
  • $merge(otherval) Merges the child nodes and attributes of another value.

After these functions were added to grace, I started using them religiously. Inline declarations are a powerful way to keep your code readable. Thanks, jQuery.

The Seeming Paradox of the Popularity of XML

Sunday, February 10th, 2008

The growth of XML as a data exchange standard is often seen as a positive development in terms of the chaos it replaced. The relationship the world has with XML, however, is not built on pure love. It’s interesting to know a bit about the problems people have with the standard, especially if we can grasp why, if it has all these problems, XML is still proving itself to be so pervasive.

1. It’s hard to parse

XML data structures can destroy your soul by just looking at them. Now, if this were just about our meatware being wholly inadequate anyway but, at least, would’ve allowed programs to have an easy time parsing these beasts, that would have been just fine. LISPy S-expressions can be parsed by an amoeba. The spec for JSON fits in your browser without scrolling for crying out loud. But what do you know, XML syntax can be quite quirky, what with namespaces, escaping and the hundreds of ways to get things wrong.

2. Seriously, it’s insanely hard to parse

Any sufficiently large data structure will look intimidating. People do insane things with XML. Mindnumbingly insane things. It’s like the ‘porn rule’: Think of the stupidest way you can imagine to encode a data structure into XML, used to encode the stupidest data structure you dare to dream of, multiply that by two and then realilze that someone is doing that right now.

So why isn’t everybody using your favorite serialization standard?

Since we’ve already established how XML is hard to parse, how come people aren’t flocking to alternatives? Why won’t XML just die? Why, why? I think the answer is attributes. Your favorite serialization standard does not have them:

  • JSON - nope
  • S-expressions - fuggeddit
  • Serialized PHP - you’re joking, right?
  • NextSTEP plists - nuh-uh
  • YAML - no matches for ‘attribute’, plus its complicated rules end up making the standard even more insane than XML.

So why do these standards not support attributes, you may ask? Surely if we take one of these and add attributes, we would end up with a superior standard? This is where reality will bite. I’ve been toying with these ideas and my preliminary conclusion is that adding attributes to an encoding standard will make it, you guessed: harder to parse!

Who needs attributes anyway?

In the spirit of KISS and in defense of one of the standards that is currently easier to parse, people have been into various forms of denial about attributes. The unescapable fact is that the concept of attributes is tremendously useful. It allows for distinguishing two kinds of properties that can be attached to a single node in a data tree: Data attributes that say something about the node and child nodes that the node owns.An illustration of this distinction inside a ‘tree only’ object model can only be accomplished through cumbersome solutions:

  • Using a magic prefix, like object['__myattribute'] versus object['somechild']. Oh, and you just ruined your namespace.
  • Making a second step in the hierarchy, so you do object['attr']['myattribute'] and object['children']['somechild']. You’ll clearly enjoy those heroic ventures into /object/children/child1/children/child2/… and you will run into a need for that.
  • Some hybrid form, like using object['__attr__']['myattribute']. Not too shabby, but you still lost the ability for a node to have direct data alongside attributes, so you will have to access that through object['__data__'] or some such. You also polluted your namespace again, just not as badly as before.

Since there are data models that only XML can express while XML can easily express the models of other encodings, the standard is not likely to be going anywhere any time soon.

XML and OpenPanel

With that in mind, we still picked JSON for OpenPanel’s RPC because it was assumed to be faster to parse and easier to map to javascript options on the GUI side. We dealt with a PHP middle layer during earlier builds that we fed serialized PHP arrays, so we already had to stick to the lowest common denominator with regards to data modeling the RPC structures. With the PHP layer gone and the CLI not giving a hoot about what format is spoken, JSON sounded like the best idea. Adding an XML mapping to this will not be a problem.

On the side of modules, we’ve always been heavy on using XML structures to define the make-up and layout of the object tree inside the opencore database. Lately, this has been bugging me for a bit. The module.xml format has mostly organically grown around the changes inside opencore’s object database and module interaction API as they came about. It’s not the most friendly of XML files for developers to edit, though, and module programming is something we want to encourage as much as possible. So I came up with a text format that allows for the mixture of data, attributes that is typically needed but is easy to parse, for humans and machines alike.

Here’s the syntax:

# simple tree
Person john
   string name
   dict address
       string email
       string msn
       string homepage

In Pythonesque fashion, we parse parent/child structuring through the indent level. Let’s add some data:

# tree with values
Person john
   string name          : John T. Ripper
   dict address
       string email     : johnt@ripper.co.uk
       string msn       : johnt152@hotmail.com
       string homepage  : http://www.ripper.co.uk/~johnt

And, finally, let’s add those dreaded attributes:

# tree with values and attributes
Person john             < recordupdated true
                        < recordsaved false
   string name          : John T. Ripper
   dict address
       string email     : johnt@ripper.co.uk
                        < hide true
                        < sendmarketing false
       string msn       : johnt152@hotmail.com   < hide true
       string homepage  : http://www.ripper.co.uk/~johnt

That’s all. It’s too limited to ever replace XML, but it matches the subject layout of our module.xml perfectly, so it can be used to generate the XML files for people who like it, but sticking to directly editing module.xml is just fine. It’s more an experiment in purpose-driven textualization of a data model than it is trying to be what m4 became for sendmail.cf.

For a comparison view: module.xml versus module.def.