Visualizing Parallel Programs

Take a moment and visualize your favorite parallel program. . . . . . .

That’s enough. What does it look like? What shape and color is it? What does it sound like? How does it feel – is it smooth or rough? How does it taste – is it sweet or sour? What does it smell like?

Isn’t it funny that when we try to visualize a parallel program (or any kind of program), we get sort of a big blank, or maybe a flow-chart-ish kind of image. The five senses that we normally use to interact with the world around us are completely unavailable to us when we try to think about (parallel) programs. Programs are implemented in logic without any direct connections to our everyday world (unless you consider textual representations of auditory sensations a direct connection).

As developers, we have worked ourselves into a corner where the only tool we have available to us is “logic”, meaning that we have abandoned using all of our five natural human senses. Effectively, we have decided to sculpt the statue of David using only 2 fingers. Yes, logic may be the capstone and crowning achievement of human intellect, but that doesn’t mean that pure logic is necessarily the best tool to use for every task. Try using your cutting wit or incisive logic on a chunk of marble.

When I try to visualize a program (parallel or otherwise), I either end up with something that looks like a block diagram or something that looks like a spiderweb with silken strands stretching between vague amorphous blobs of functionality. And the only smell I get is that rotten smell of bad code. If I try to zoom in on my mental image, I generally get an image of text, of the code that actually implements the functionality of the program. And once I start to visualize code, I lose the rest of the program image. Logic is single-threaded and can only focus on one thing at a time. If you’re not convinced, check the rabbit/duck or candlestick/faces images in this blog.

Now visualize the Mona Lisa painting. If you mentally zoom in on the winding river or the tree-lined horizon, it’s not hard at all to back up and visualize the whole painting, to see the enigmatic half-smile of the lady. If you focus on her eyebrows, it’s not hard to zoom out and see the complete image and her quiet smile.

Now let’s imagine we’re playing Pictionary and you get the card to draw UDP – how do you draw Service Ports or Packet Structure or Checksum Calculation? And then your opponent gets the card to draw a kitten playing with a ball of yarn. Who has the easier task?

One of the goals of the Avian Computing Project is to make it easier to visualize what is happening inside parallel programs by modeling them as things that we can easily visualize. Things that we can visualize or in other ways imagine are easier to think about and talk about.

In Avian Computing, we imagine flocks of birds flying around together but think about each bird independently and asynchronously performing separate actions. Instead of thinking about mutexes and lines of code and locks and blocks, Avian Computing allows us to think and visualize the work we want each individual thread to perform. Since all of the birds automatically behave the same way and follow the same life cycle, developers can focus on the unique actions that each bird needs to perform.

For example, in an ETL (Extract – Translate – Load) program, some birds would ONLY know how to extract data from the DB, others would ONLY know how to transform the data, and others would ONLY know how to load the transformed data into the new DB. Allocating the ETL logic between three birds makes it easier to visualize, easier to code, and easier to debug.

Remember those old hierarchical databases based on 80-column records? Remember the convoluted logic we’d need to use to get the header record and then combine that info with each of the detail records so we could create complete records that could be written into one or more relational tables? Hard enough in single threaded logic, but trying to code that into a standard parallel program can be a daunting task.

But in bird logic, one type of bird knows how to get one header and all of its detail records. When it has that, it puts that info into the tree as Food A. Another type of bird only knows to eat Food A, combine the header info with the detail info, and write each completed record into the tree as Food B. Another type of bird only knows how to eat Food B and convert the info into Foods C & D and put them back into the tree. Another bird eats only Food C and puts them into the Cust_Name table. Another bird eats only Food D and puts them into Cust_Credit table.

Using only this common vocabulary of activities and events (Avian lifecycle), it becomes fairly simple to visualize and describe the major functions of a parallel program. We can mostly think about what we want the birds to eat and what we want them to do while they are digesting (processing) their food; we no longer have to manage structural requirements, such as locking or synchronizing or variable scope, etc.

Instead of thinking about code and how we’re going to prevent data corruption and deadlocks, we can visualize what we want individual birds to actually do on one piece of food, one individual object that each individual bird has absolute control over. And when they’re done with their piece of food, they put it back into the TupleTree for a different bird to eat, digest, and process.

Sure, there’s lots of other stuff going on, but the only locking in the Avian environment happens when food is eaten (removed from the TupleTree) or when food is stored (inserted into the tree). And all of that locking is invisible to the developer, so the only thing they need to worry about is what should be done to each chunk of food.

You could call it the 5-year old test. If you can explain what your program does to a 5-year old and have them understand it, then you’ll be able visualize it with sufficient clarity that the actual implementation of the program in code should be relatively straightforward. And certainly quicker to get started coding, quicker to get functional, and quicker to get into testing and production.

Leave a Reply

Your email address will not be published. Required fields are marked *