Category Archives: New Development Model

Avian Computing and JavaSpaces

A few years ago when this whole Avian Computing thing got started, I considered basing this project on Sun’s Jini/JavaSpaces technology. Why should I reinvent the proverbial wheel when Sun has already invested a significantly greater number of programming hours than I’ll ever hope to invest by myself, using much better programmers than I’ll ever hope to be?

However, a cursory review of JavaSpaces at that time yielded the gut-level feeling that JavaSpaces was a much bigger solution than the problem that I was trying to solve. Sure, both JavaSpaces and the (soon to be) Avian Computing use Linda constructs but that was about it. So I took the path less traveled and started working on Avian Computing.

Recently I started to question the wisdom of that decision and consequently started to read about JavaSpaces. Turns out my gut-level feeling was right. JavaSpaces is all about distributed computing which coincidentally happens to be asynchronous and parallel while Avian Computing is focused on thinking about parallel applications and how to mentally visualize the objects and threads of a parallel program. The fact that both projects use Linda constructs just proves how useful and universal Linda constructs are.

Differences

JavaSpaces provides a technology that allows an application to interact with Entries in (Java)Space and to access services available on other computers or to acquire the codebase (when necessary) to perform the required functions locally. JavaSpaces makes the location where the actual computations are performed invisible and irrelevant. JavaSpaces begins with the client-server architecture and morphs it into a homogeneous universal solution. Parallelism in the system is implicit and not explicitly encoded into the individual clients.

Avian Computing begins with the assumption that multicore CPU’s are the new normal and the biggest obstacle to improved computing is our inability to effectively use the power of these multicore CPU’s. And that the biggest obstacle to using the full power of multicore CPU’s is our inability to conceptualize parallel applications.

Avian Computing, as implemented in ConcX, encourages us to think about the actions of individual birds in the flock and how as a flock they will accomplish their goal. This simplifying metaphor encourages us to explore the inherent parallel possibilities of the application. The perspective provided by Avian Computing and ConcX reveals opportunities for utilizing the full power of multicore CPU’s that are frequently non-obvious to developers comfortable with single-threaded programming.

JavaSpaces is an additional library of code, increasing the complexity of Java applications and reducing the number of programmers who can develop or maintain the code. Avian Computing is a simplifying technology that tries to minimize the amount of new code that must be written or maintained. And the code that is written is typically more standard Java that can be developed and maintained by more programmers.

JavaSpaces overrides the word Public so it has a different meaning in JavaSpaces than in regular Java. Additionally, Entries in JavaSpaces require that all key fields be Public, surrendering private fields, accessor methods, and object encapsulation. Avian Computing doesn’t require learning new meanings for standard java keywords or new rules for objects.

JavaSpaces provides a complicated implementation of the Linda constructs, providing a multitude of ways in which a tuple (Entry in JavaSpace) can be NOT found. For example, the key fields not matching exactly, or the transaction not matching, or the desired tuple not being available at the right time. Avian Computing and ConcX uses a much simpler method to find matching tuples; each bird looks for only 1 or 2 types of food and if it doesn’t find appropriate food, it just waits a little while and automatically tries again. No extra programming required. No additional concepts to learn. No complicated reasons for NOT finding the tuples that actually exist.

Conclusions

Even though I am glad that I followed my gut and didn’t try to leverage JavaSpaces, I expect that the Avian Computing project will incorporate many of the features and strengths of JavaSpaces. But only if they can be added without overly complicating using ConcX and Avian Computing.

Modeling Operating Systems in Avian Computing

One of the initial design goals of every operating system (OS) is that it be lightweight and have minimal performance impacts on the running applications. Unfortunately, as the OS matures, it begins to take on baggage and assume a heavier footprint.

The goal of lightness probably has an unintended consequence; it probably makes it harder for the developers to understand what their code actually did. Lightness generally means terseness, meaning no excess code, not even any diagnostic code.

An interesting way to overcome this apparent limitation would be to use Avian Computing and ConcX to model the OS being designed. Each of the processes to include in the OS would initially be a ConcX entity that performs the task(s) of the final process.

There would be several advantages of this method. First, and perhaps most importantly, using the built-in logging features in ConcX entities, it would be simpler to identify the conditions that lead to a failure. This would be increasingly true as the amount of parallelism built into the OS grows. The more sophisticated and parallel an OS, the greater the need for help locating the cause of any failure.

For example, assume that the operating system will use Semaphore X to control some resource and that semaphore became unavailable to the various processes. In ConcX, it would be relatively simple to find which of the threads had obtained Semaphore X, when exactly it happened and what it was waiting for that was preventing it from releasing Semaphore X. Assuming the developer had instrumented his bird properly, it would have recorded when it ate Semaphore X and any problems or issues that it encountered that prevented it from releasing Semaphore X. The developer might even have made it easier to diagnose by writing an error food object out to the TupleTree, such as when some value is expected to be Zero or One and instead it is a negative value or greater than One.

Which leads to the second advantage of modeling the OS in ConcX; the ability to modify the system with minimal effort. When a potential fix is identified, it can be inserted into the appropriate bird(s) and the system restarted. No major recompiles or installing the executable in the test system(s). And much like with Unit Testing, a test bird that is configured to always produce the error condition and the system run to verify that the system handles the error appropriately.

Beyond error correction, easy modification of the modules in an OS makes it easier for developers to experiment with how the functions in the OS are allocated. For example, is it better to have one code module with a huge IF statement that then calls sub-modules or is it better to have a bunch of separate special-purpose modules? Should Capability X be included in Function Y or should they be separate functions?

Additionally, it should be easier to identify which birds are the bottlenecks. If one or a couple of birds are performing some capabilities that always cause other birds to wait excessively, then those birds can be analyzed to see if they can be split into separate functions or simplified or streamlined, etc.

A third advantage is the ability to catch “black swan” events. Unexpected conditions are frequently difficult to identify because the developers “knows” that some value will always be Zero or One so never considers the possibility that it might be outside the range so won’t find that error until they consider what happens when it does fall outside the range.

If the developer codes his birds correctly, any unexpected values will be recorded in the bird’s history and/or will write an error object to the TupleTree. This assumption-trapping is easy to write in ConcX and has minimal impact to overall performance but pays huge dividends by catching unexpected conditions that can lead to unexpected behavior by (or crashing of)  the OS. Identifying the failures of code or values that are “too simple to fail” and identifying all the conditions that must be correct for the module to succeed effectively produce a “criteria list” that the developer of the final OS must be able to meet.

Another advantage of modeling in ConcX is that low-level errors could intentionally be allowed to propagate thru the OS to study the effect on the system. Errors are not all created equal. Some errors could cause catastrophic results; other errors might have zero overall impact because they null out and are internally eliminated. Knowing which errors have the greatest potential for affecting the OS allows developers to focus their limited time and attention to where they will have the greatest impact.

Perhaps most importantly, modeling the OS in ConcX will allow the developers to think about and interact with their new OS at a higher level of abstraction. Conceptually, they can move functionality around and adjust behaviors with minimal costs in time and effort. ConcX provides a loosely-coupled environment where changes to one piece of code will only affect other pieces of code thru a well-defined interface (the TupleTree).

And then at the end of the modeling phase, developers have a working “flowchart” from which to code the actual OS. All of the time spent coding the birds in ConcX is thrown away. Every bit of the analysis and effort to understand the new OS is kept. The most critical portions of the OS can be coded quickly and with confidence because they are already well understood and well defined because of the time spent modeling the OS in ConcX.

The Whorf Hypothesis

Lieutenant Worf

Lieutenant Worf in 2366

No, no, we’re not talking about the big tough guy from Star Trek. Instead, we’re taking a little detour from what we usually talk about to consider the applicability of linguistic relativity, also referred to as the (Sapir) Whorf Hypothesis. (I put Sapir’s name in parentheses because he was an influential teacher and mentor of Whorf but they never collaborated on the hypothesis).

Benjamin Lee Whorf’s Hypothesis “is the idea that differences in the way languages encode cultural and cognitive categories affect the way people think, so that speakers of different languages will tend to think and behave differently depending on the language they use.”

Benjamin Whorf

Benjamin Whorf

A minor example of this might be the English word “parent” in Spanish is “padre” which is the same word that is used for “father.” How could a teacher send home a non-gender specific letter to the parent(s) of one of her students if the Spanish word for parent is father? Another example would be trying to describe or explain snow in the native language of someone from New Guinea or Vanuatu or other equatorial countries where the temperature never drops below 75 degrees.

Benjamin Whorf graduated from MIT in 1918 with a degree in chemical engineering and had a long successful career in fire prevention. In addition to his work preventing fires, in 1925 he began studying the Nahuatl language (Aztec) and Mayan hieroglyphics. By 1930 he was considered a leading name in Middle American linguistics, prompting him to travel to Mexico and to publish more papers about Nahuatl and Mayan. After 1931, Whorf started taking classes from Sapir, who had a significant influence on Whorf’s thinking about language. In 1938, Whorf’s health began to decline because of cancer and he passed away in 1941.

The Wikipedia biography of Whorf includes this anecdote:

“Another famous anecdote from his job was used by Whorf to argue that language use affects habitual behavior. Whorf described a workplace in which full gasoline drums were stored in one room and empty ones in another; he said that because of flammable vapor the “empty” drums were more dangerous than those that were full, although workers handled them less carefully to the point that they smoked in the room with “empty” drums, but not in the room with full ones. Whorf explained that by habitually speaking of the vapor-filled drums as empty and by extension as inert, the workers were oblivious to the risk posed by smoking near the “empty drums”.”

So what does this have to do with parallel programming? The point here is that when we talk about parallel programs and what is going on under the hood, we have to use programming words (fork, join, lock, synchronize, etc) to describe parallel activities which constrains or limits how we think about the actual parallel activities that we want to happen. Our thinking is limited to the just artificial constructs provided in the programming languages.

Much like Whorf’s Hypothesis, the Avian Hypothesis is that programming language determines the actual program behavior, what can be done, and even how one can think about its actions. By changing our thinking about the parallel activities to more natural scenarios, such as flocks of birds, it becomes easier to think about what we want done without the constraints of what the programming language allows us to think about.

Avian Parallel Databases

The top-of-the-line databases are all executing highly optimized code designed to be run on the highest-performing multiple CPUs/multi-core systems. The basic architecture of database software is all similar to operating systems, with protected sections of code that handle multi-core functionality, with lots of consideration and thoughts about how to produce the desired results, data caching, data locality, preventing data staleness, and so on.

So how could any parallel database system compete with the perfection of the top-of-the-line databases? By thinking outside the box and embracing the coming kilo-core future.

In kilo-core systems, it makes sense that each of those kilo-cores would have some local RAM or extended cache. This core-specific memory would prevent huge system delays caused by all of those kilo-cores trying to access the same shared system RAM.

So imagine each core in a 1,000 core system loaded some portion of a database into its local RAM. If each core had 1 MB available local RAM, a 1GB database could all be loaded in high-speed memory at all times, providing a significant speed improvement compared to reading data off a hard disk.

Additional performance improvements would come if each core was responsible for indexing only the data held in its memory and not some shared index. The size of the indexes that each core would have to search would be significantly smaller, producing much faster searches.

Now consider that a 1GB database broken into 1,000 pieces yields data volumes on each core that are small enough that all of the data could be indexed. Imagine retrieving data from any field at indexed speeds. Suddenly all of those off-index queries, such as searching for a name in a comment field, would be completed just as quickly as indexed queries.

Now consider what happens when submitting a SQL statement in a Avian parallel database; the SQL-interpreter bird sings (broadcasts) the SQL request to all 1,000 cores which causes all 1,000 cores to start searching their local RAM for the data that matches the SQL statement. Instead of multiple threads competing for limited system resources, all of the resources are searching simultaneously for the requested data. Instead of preprocessors optimizing requests and scheduling disk reads for optimum results, the data are just searched simultaneously and the matching values are returned.

And perhaps most importantly, the data held in the cores wouldn’t have to be homogeneous. Instead of requiring all “Name” data to be in a Name table that has one exact structure that is defined in the database, the names could come from any name resource, including database tables, XML files, JSON lists, etc. Each core would only have to know how to search it’s own data. All the problems with field sizes and data consistency would all go away.

Now imagine the capabilities of a Mega-core system running a parallel database. Even with modest local RAM, we’re talking a terabyte database that could return results on any field in relatively short times. The mind boggles at the potential.

The ideas for an Avian Parallel Database presented here illustrate how using massively parallel systems should produce significantly improved performance compared to traditional databases whenever more than 1 piece of data is required. Retrieving the name of the customer with ID = 12345 would not be faster, but retrieving the names of all customers who meet certain criteria should be significantly faster because thousands (or eventually millions) of data sources could be searched simultaneously.

Avian Parallel Operating Systems

One interesting potential of Avian Computing is its loosely decoupled nature. On occasion, I’ve compared an Avian program to a flock of birds landing in a field and looking for food and when they’ve found all their food (the program ends), they all fly away. But do all of them have to fly away?

What if the really basic birds didn’t fly away? What if the birds that handled keystrokes and serial ports and network communications and all the really low-level stuff remained in place? What if all of the birds that handled all the basic system behaviors remained active in memory? Suddenly, the single-threaded and highly protected kernel would disappear and all of its capabilities/functionality could be handled in parallel.

If you imagine a system with 10,000 cores, how would you process everything thru a single-threaded kernel? Or thru a tightly controlled multi-threaded kernel? Certainly they couldn’t all stop and wait for the kernel processes to complete before they continue. If we are going to derive any benefits from moving to kilo-core systems, we will need different operating system architectures to get those benefits.

What if the birds (cores) that handled system events all worked thru a standardized process, like the TupleTree(s)? All of these activities could work simply and asynchronously, without a lot of planning or coordination. It would provide a structure that enables highly parallel processing of operating system activities.

And if the “operating system” birds were auto-adjusting, like they are in Avian Computing, whenever a particular OS service needed additional processing power, the bird(s) responsible for that OS service could clone itself as many times as required to meet the need or hatch the appropriate kinds of birds.

Instead of compiling a single kernel that should handle most situations, a flock of OS birds would “fly down and land” into the computing hardware and grown/shrink to match the usage encountered for that individual system. Adding new features and capabilities to a system would be a relatively simple process of releasing a new bird with no need to install new drivers and rebooting or recompiling the kernel and then rebooting.

User-level software would have no need to know the details of how keystrokes or ethernet inputs were gathered; they would simply attempt to get the info they need from the TupleTree(s). The providing layers would look in the TupleTree(s) for the food that they need, and when they eat it, they’d store it back in the TupleTree(s) for the user-level software. The providing layers would eat food that was put in the TupleTree(s) by the operating system birds. It would be consistent from the lowest levels to the highest levels.

Security measures would need to be developed so individual systems aren’t compromised and infected with viruses and unwanted processes, but these could be developed in advance of releasing an Avian Operating System. And any updates to the kernel could be handled dynamically so there would never be a need to reboot.

One of the reasons why I am personally optimistic about Avian Computing is because it’s potential extends far beyond what can be realized in a single parallel program. Avian Computing extends parallel computing so it could include operating system functionality, so the same ideas and concepts would describe not just individual programs but all capabilities within a system. An Avian Computing Operating System would effectively be a “fractal system”, where the same behavior at the lowest level would be repeated at each higher level. It would be consistent from top to bottom, completely parallel and completely asynchronous.

It is chaos. It is unstructured and unpredictable. It behaves like real life. What’s not to love.

Visualizing Parallel Programs

Take a moment and visualize your favorite parallel program. . . . . . .

That’s enough. What does it look like? What shape and color is it? What does it sound like? How does it feel – is it smooth or rough? How does it taste – is it sweet or sour? What does it smell like?

Isn’t it funny that when we try to visualize a parallel program (or any kind of program), we get sort of a big blank, or maybe a flow-chart-ish kind of image. The five senses that we normally use to interact with the world around us are completely unavailable to us when we try to think about (parallel) programs. Programs are implemented in logic without any direct connections to our everyday world (unless you consider textual representations of auditory sensations a direct connection).

As developers, we have worked ourselves into a corner where the only tool we have available to us is “logic”, meaning that we have abandoned using all of our five natural human senses. Effectively, we have decided to sculpt the statue of David using only 2 fingers. Yes, logic may be the capstone and crowning achievement of human intellect, but that doesn’t mean that pure logic is necessarily the best tool to use for every task. Try using your cutting wit or incisive logic on a chunk of marble.

When I try to visualize a program (parallel or otherwise), I either end up with something that looks like a block diagram or something that looks like a spiderweb with silken strands stretching between vague amorphous blobs of functionality. And the only smell I get is that rotten smell of bad code. If I try to zoom in on my mental image, I generally get an image of text, of the code that actually implements the functionality of the program. And once I start to visualize code, I lose the rest of the program image. Logic is single-threaded and can only focus on one thing at a time. If you’re not convinced, check the rabbit/duck or candlestick/faces images in this blog.

Now visualize the Mona Lisa painting. If you mentally zoom in on the winding river or the tree-lined horizon, it’s not hard at all to back up and visualize the whole painting, to see the enigmatic half-smile of the lady. If you focus on her eyebrows, it’s not hard to zoom out and see the complete image and her quiet smile.

Now let’s imagine we’re playing Pictionary and you get the card to draw UDP – how do you draw Service Ports or Packet Structure or Checksum Calculation? And then your opponent gets the card to draw a kitten playing with a ball of yarn. Who has the easier task?

One of the goals of the Avian Computing Project is to make it easier to visualize what is happening inside parallel programs by modeling them as things that we can easily visualize. Things that we can visualize or in other ways imagine are easier to think about and talk about.

In Avian Computing, we imagine flocks of birds flying around together but think about each bird independently and asynchronously performing separate actions. Instead of thinking about mutexes and lines of code and locks and blocks, Avian Computing allows us to think and visualize the work we want each individual thread to perform. Since all of the birds automatically behave the same way and follow the same life cycle, developers can focus on the unique actions that each bird needs to perform.

For example, in an ETL (Extract – Translate – Load) program, some birds would ONLY know how to extract data from the DB, others would ONLY know how to transform the data, and others would ONLY know how to load the transformed data into the new DB. Allocating the ETL logic between three birds makes it easier to visualize, easier to code, and easier to debug.

Remember those old hierarchical databases based on 80-column records? Remember the convoluted logic we’d need to use to get the header record and then combine that info with each of the detail records so we could create complete records that could be written into one or more relational tables? Hard enough in single threaded logic, but trying to code that into a standard parallel program can be a daunting task.

But in bird logic, one type of bird knows how to get one header and all of its detail records. When it has that, it puts that info into the tree as Food A. Another type of bird only knows to eat Food A, combine the header info with the detail info, and write each completed record into the tree as Food B. Another type of bird only knows how to eat Food B and convert the info into Foods C & D and put them back into the tree. Another bird eats only Food C and puts them into the Cust_Name table. Another bird eats only Food D and puts them into Cust_Credit table.

Using only this common vocabulary of activities and events (Avian lifecycle), it becomes fairly simple to visualize and describe the major functions of a parallel program. We can mostly think about what we want the birds to eat and what we want them to do while they are digesting (processing) their food; we no longer have to manage structural requirements, such as locking or synchronizing or variable scope, etc.

Instead of thinking about code and how we’re going to prevent data corruption and deadlocks, we can visualize what we want individual birds to actually do on one piece of food, one individual object that each individual bird has absolute control over. And when they’re done with their piece of food, they put it back into the TupleTree for a different bird to eat, digest, and process.

Sure, there’s lots of other stuff going on, but the only locking in the Avian environment happens when food is eaten (removed from the TupleTree) or when food is stored (inserted into the tree). And all of that locking is invisible to the developer, so the only thing they need to worry about is what should be done to each chunk of food.

You could call it the 5-year old test. If you can explain what your program does to a 5-year old and have them understand it, then you’ll be able visualize it with sufficient clarity that the actual implementation of the program in code should be relatively straightforward. And certainly quicker to get started coding, quicker to get functional, and quicker to get into testing and production.

MVC-like Programming Vocabulary Required

Programming languages may be good at implementing a computer program, but they are a lousy way to think about the intended program. Instead of thinking about what we want the program to do, we end up thinking about the features that are available in the selected programming language. And then we develop the program structure based on the features available in that language instead of what we want the program to do.

For example, if we want to build a program that will find the square root of the volume of Kim Kardashian’s butt (don’t ask why –  just go with it), we first have to decide if we’re building a command line program or a GUI program. And that basic decision shapes our thinking about how we will build the program and the language that we use to implement the program. Very little of the code from a command line version of the program could be used in a GUI version of the program.

And then when we talk about how we are calculating that value, it is almost impossible to explain what we are doing without describing the code we are using to implement the calculations. What we are doing becomes all mixed up with how we are doing it. For example, if our chosen language includes a “calcButtCheekVolume” function then our code will be relatively simple; without that function, we have to calculate the volume ourselves based on butt cheek curvature and cheek width, etc, making it much harder to code. All because of one single function. It would tempt one to use an inappropriate or obsolete language just to gain access to that one function.

The Mac programming world has jumped onto the Model-View-Controller (MVC) paradigm, where the Model and the View do not directly interact but instead process all updates thru the Controller. This separation allows the programs to cleanly separate display requirements from implementation requirements, allowing developers to modify the back end processes without affecting the front end (display).

So back to Kim Kardashian’s butt, using MVC, we would find a way to Model the volume of a butt cheek and then add a way to display (View) the volume (graph, chart, photo-morphing, etc), and then the Controller code would  make adjustments to the model, depending on if she is standing or sitting, gaining weight or losing weight, etc.

Unfortunately, we programmers are still functioning at a primitive level, where the program structure is still based on the language that was selected to implement the program. We have no way to deeply describe the functionality of a program without also including implementation details such as the language, the library, and how the user will interact with the program.

Programming languages need the equivalent of MVC so it would be possible to cleanly separate what we want a program to do from how we want it to do it and implement it. For example, the Model would be the data that the program uses and the View would be the user interface of the generated program, and the Controller would respond to changes in the Model or manipulate the Model as required, updating the View as required.

Developers would then be able to calculate the square root of the volume of Kim’s butt without worrying about which language (and library) was being used to implement the solution or how that solution would be displayed to the user (CLI vs GUI). Which means Fortran could be used to calculate the volume and c could be used to calculate the changes in the shape if she is jumping and Java could be used to calculate the shape if she is sitting, etc. The compiler(s) and MVC-like Programmer’s Interface would manage the code and generate the desired output of the eventual program without requiring language and structural concessions on the part of developers.

Blindly adopting the MVC model is not the answer because programming languages solve many problems beyond the MVC model. The point here is that our thinking about developing programming solutions has remained mired in a 1950’s mentality, where programs are developed in a single muddy interconnected language-feature-functionality-library lump, where changing any single component dramatically affects the implementation of the whole program.

We need to develop a programming vocabulary that can describe what a program will do that is independent of the programming language chosen if we are to ever hope to become more efficient and faster at developing computer programs. And if we can’t get better at developing relatively simple single-threaded solutions, we’ll have almost zero chance at getting better and faster at developing the comparatively more difficult parallel programs.

Avian Computing – Overview

macawThe Avian Computing Project seeks to reduce the length of time it takes to develop parallel computer programs by improving how we think and talk about parallel programs. To accomplish this goal, the Avian Computing Project replaces the current mental programming model (based on “math equation-ish” lines of code) with a mental model based on natural group elements, such as birds or ants or fish. The advantages are discussed in blogs about using a nature-based model to encourage overview level thinking while reducing exposure to program code. These aspects were chosen based on the guidelines for the next generation development model that were derived based on the limitations of the human brain AND the shortcomings of developing computer programs using languages.

Flock of Birds

Using a flock of birds as the model for hundreds or thousands of processors/cores will make it easier to develop programs that can use all those cores

One basic assumption of the Avian Computing Project is that the number of processors or cores available in each computing device will continue to increase until hundreds or thousands of cores will be available in each system. While the technology to physically build such devices currently exists (for example, the hundreds of cores currently available in Graphical Processing Units (GPU) in our graphics cards), we do NOT have a way of quickly and efficiently building the software that can make use of all of those cores.

The Avian Computing Project focuses on reducing parallel program development times by making it easier to think and talk about parallel operations. It does this by emphasizing nature-based models instead of lines of code. The primary goal of Avian Computing is NOT to make apps that run faster in parallel but instead to make it faster to get parallel apps running. When the tasks in an app are correctly apportioned and distributed among the (potentially hundreds of) cores available, the speed issue will take care of itself.

All birds follow this basic life cycle

All birds follow this basic life cycle

The Avian Computing environment also provides a standard framework (the Concurrency Explorer or ConcX) to configure and launch birds (threads). All birds have a runtime life cycle that resembles the natural behavior and life cycle of birds. Birds are hatched, look for food, eat it and digest it, store the resulting food and then take a nap before doing it all again. They also reproduce when they are well fed and die off when they cannot get enough to eat. The only coding that needs to be done by the developer is describing what is done during the digestion phase; the rest is accomplished by selecting and configuring the right birds.

All of the fussy details about thread locking and deadlocks that are normally agonized over in parallel programs are handled by ConcX so the developer doesn’t have to. This isn’t a convenience or just some handy feature; instead it was considered an absolute necessity to hide all locking in the framework so developers can focus solely on making the program work correctly. Same for starting new threads and stopping un-used threads and thread sleep; it’s all handled in the framework so the developer doesn’t waste time thinking  about them.

The Avian environment and ConcX borrow the concept of the Tuplespace from Linda, the parallel programming constructs developed by David Gelernter and Carriero and others back in the 1980’s and 1990’s. In the Avian environment, every bird gets its food from the TupleTree and stores its results back in the TupleTree. All locking is automatically managed by the TupleTree when food (objects) move in or out of the tree. The TupleTree is the only place where locking is required and the TupleTree does it all invisibly for the birds. When a bird looks for food, it “requests” a specific food type. The TupleTree locks itself as required and returns a matching food object if it has one or returns null if it does not. A basic diagram of the Avian Computing Environment is shown below.

Basic Avian Diagram

All birds feed in the TupleTree, eating a specific food and storing a specific food. The birds feed asynchronously at the rate appropriate for the task that they are performing.

When a bird receives a food object from the TupleTree, it has the only copy of that object and is the only bird that can make changes to it while it possesses it. (Other instances of the requested food type may still be in the TupleTree, just like a real tree will normally have more than one fruit or seed).

As the bird is digesting the food object, it is performing some work or applying some transforms to the data contained in the food object. When its work is completed, the bird normally stores one or more food objects back in the TupleTree where other birds can eat the resulting food.

Napping allows the processors to share resources – basically so all the threads play nice together. The maximum nap length is configurable by the developer in ConcX. The actual length of each nap is a randomly selected duration between 0 ms and the (maximum) nap length set by the developer.

Logging is internally maintained by each bird and internally to each food item. The internal bird logs are viewable in ConcX immediately after a bird is stopped. The internal food logs are listed on the TupleTree tab of ConcX; each food item listed contains a timestamp of when each action was made and which bird made the action.

By carefully analyzing the logs, issues can be researched and problems identified. The logs are written in CSV format and can be saved to files and then opened with spreadsheet programs. It has been very useful to be able to find what every bird was doing at an exact moment in time. Again, because this is all handled automatically; the developer doesn’t have to worry about them.

All together, the Avian Computing project tries to provide the concepts, tools and resources needed to quickly develop parallel programs. The Avian Computing project encourages us to think about parallelizing applications and stop thinking about code.

Avian Computing Goals – Develop From An Overview Perspective

One of the main goals of the Avian Computing Project is to reduce the length of time it takes to develop parallel computer programs. In this post, we’ll look at how focusing developers on the overview perspective will help us achieve our goal.

Developers/humans work better at the overview level. For example, if I say, “let’s send a person to the moon,” we all have a pretty good idea of what’s going to happen. But if every person who works on the moonshot is required to know every single detail, that trip may never happen because everyone will be trying to do every job and interfering with each other. If we can take an overview perspective, however, we can break the moonshot into logical tasks, such as propulsion, life support, computational, etc., and again break each task into subtasks and so on until everyone has a reasonably-scoped amount of work that they can perform without interfering with others.

In the Avian Computing environment, the overview perspective is provided by thinking about controlling a flock of birds who will together accomplish one goal or task in parallel. Some types of birds in the flock will perform one part of the process while other types of birds will perform other parts of the process.

Avian Life Cycle

Avian Life Cycle

The overview perspective is provided again by the diagram of the standard life cycle of a bird (thread). The threads don’t really undergo this life cycle (threads can’t really eat); instead this conceptual overview provides us with enough knowledge about Avian Computing that developers can use the Concurrency Explorer framework (ConcX) to produce parallel applications in relatively short lengths of time.

Locking Details

The overview perspective is easier with the Avian framework is because it handles all the locking and synchronizing automatically, inside the framework. This eliminates spending time thinking about mutexes and synchronizing and deadlocking and all of the related junk that developers normally have to contend with when developing parallel applications. It accomplishes this by borrowing some concepts from the Linda parallel programming construct and adapting them to the bird metaphor. Linda was pioneered by David Gelernter in mid-80’s and developed with Carriero & others into the ’90s. Linda manages information in tuples and threads select and work on tuples that meet their specified conditions.

Birds spend a lot of time in trees, so in the Avian environment, the birds find and eat all of their food from the TupleTree. When they are done digesting their food, they store their food (results) back into the tree. In Linda-speak, these are the in() and out() functions.

All of the locking and synchronizing is managed by the TupleTree. The TupleTree manages all requests by birds to get food and to store food as synchronized methods. If a bird gets some food to eat, that food is removed and gone from the tree, guaranteeing that multiple birds cannot access it. And once a bird has a chunk of food (object), it has complete control over it. No other bird is aware of that food object unless and until it is stored back in the tree.

Avian developers are encouraged to NOT think in too much detail about which individual bird (thread) actually performs any given task. Neither should the order of the food objects in the tree be considered or depended upon. Developers should focus on how to break any given task into subtasks and translate those subtasks into food types, the kind of logic that is best done at a higher level, at an overview level.

By thinking at the overview level in the Avian environment, we can think through the tasks and subtasks that we want our parallel program to perform and then rough out those tasks in ConcX. And then, best of all, we get to run the rough application and verify that it’s doing what we want. If not, it is simple enough to change the order that the tasks are performed in or to add or subtract, combine or split tasks.

This is one of the strengths of the Avian Computing environment; the birds in your flock are loosely coupled and work independently, giving you great flexibility in achieving your goals.

Avian Computing Goals – Loosely Coupled Code

We’re all taught that we should write loosely coupled code so that changes in one code module or function won’t affect another function or module. Like all things, it’s easier said than done.

One of the goals of Avian Computing is to make loosely coupled code easy to create. Each of the birds is a separate entity with its own set of variables. The only way it can get info to operate on is to eat from the TupleTree; the only way it can pass on info is to store food in the TupleTree.

While this might seem restrictive, it is conceptually simple and follows the ways of nature. But this simple mechanism structures our thinking so we automatically produce loosely coupled code. The only way that one bird can affect another bird is by making a change to the food it stores. The only way that a bird can be affected by a different bird is by changing its response to the food that it ate.

For example, if a function changed it’s return value from “Fred” to “10”, the code that receives the returned value may or may not know how to handle the new value. Usually we manage this changed return value by remembering all the places where we use that changed function and update it’s code. And then we usually forget one place where we used, causing the system to crash or otherwise go wonky, whereupon we do a code search of our project and find those other instances where we used it. And if we’re lucky, the changed function isn’t in a library that is used in other projects that would need to be changed.

In Avian Computing, because we share everything thru the TupleTree, we force changes up to the surface or force them down inside the bird. If changing “Fred” to “10” only affects one function in that one bird, the change can never get outside to affect other birds. If the change from “Fred” to “10” should affect other birds, the change is forced to the surface, there in the shared food, where everyone can see it. No pathological invisible couplings lingering inside the code.

Of course, there are times when we’ll have multiple instances of the same bird but we want to have different responses. Perhaps we want it to return “Fred” in some circumstances and return “10” in other circumstances. In this case, the ability of the StdBird to save two different kinds of food is used. If it should return “Fred”, the bird will save a “FredFood” to the TupleTree. If it should return “10”, the bird will save a “10Food” to the TupleTree.

The developer will have to create a FredFood type and a 10Food type, as well as a FredEater  bird and a 10Eater bird, but this sounds harder than it really is. Unless there’s something really unique in the new food types, typically they are just sub-classes of StdFood with their own constructor that identifies their food type.  Same for new birds. New birds are usually just sub-classes of either StdBird or some other bird, with changes to how they digest their food.

Sub-classing food and birds this way keeps the code for the different behaviors separate and ensure that there are no invisible connections in the behaviors when processing “Fred” or “10”. This loose coupling of modules also allows developers to substitute different versions of FredEater or 10Eater without affecting the overall program.

The ConcurrentExplorer (ConcX) makes bird substitution easy; when ConcX is running, you can configure up to 100 individual birds that will participate in your program. ConcX allows you to start the desired birds (or all birds) and then stop or start any of the birds without negatively affecting (crashing) the rest of the birds. So, for example, you might stop the FredEater bird and then start the FredAEater bird and then stop it and then start the FredBEater bird. Or run Fred, FredA, and FredB and let them compete and show you which one provides the best results.

This flexibility is only available in ConcX (and Avian Computing) because of its inherent loose coupling. All of the birds in the flock behave individually in just the way they were configured, without affecting the other birds, and together as a flock they operate in parallel to accomplish the goals of the program.