Category Archives: Thinking In Parallel

The Whorf Hypothesis

Lieutenant Worf

Lieutenant Worf in 2366

No, no, we’re not talking about the big tough guy from Star Trek. Instead, we’re taking a little detour from what we usually talk about to consider the applicability of linguistic relativity, also referred to as the (Sapir) Whorf Hypothesis. (I put Sapir’s name in parentheses because he was an influential teacher and mentor of Whorf but they never collaborated on the hypothesis).

Benjamin Lee Whorf’s Hypothesis “is the idea that differences in the way languages encode cultural and cognitive categories affect the way people think, so that speakers of different languages will tend to think and behave differently depending on the language they use.”

Benjamin Whorf

Benjamin Whorf

A minor example of this might be the English word “parent” in Spanish is “padre” which is the same word that is used for “father.” How could a teacher send home a non-gender specific letter to the parent(s) of one of her students if the Spanish word for parent is father? Another example would be trying to describe or explain snow in the native language of someone from New Guinea or Vanuatu or other equatorial countries where the temperature never drops below 75 degrees.

Benjamin Whorf graduated from MIT in 1918 with a degree in chemical engineering and had a long successful career in fire prevention. In addition to his work preventing fires, in 1925 he began studying the Nahuatl language (Aztec) and Mayan hieroglyphics. By 1930 he was considered a leading name in Middle American linguistics, prompting him to travel to Mexico and to publish more papers about Nahuatl and Mayan. After 1931, Whorf started taking classes from Sapir, who had a significant influence on Whorf’s thinking about language. In 1938, Whorf’s health began to decline because of cancer and he passed away in 1941.

The Wikipedia biography of Whorf includes this anecdote:

“Another famous anecdote from his job was used by Whorf to argue that language use affects habitual behavior. Whorf described a workplace in which full gasoline drums were stored in one room and empty ones in another; he said that because of flammable vapor the “empty” drums were more dangerous than those that were full, although workers handled them less carefully to the point that they smoked in the room with “empty” drums, but not in the room with full ones. Whorf explained that by habitually speaking of the vapor-filled drums as empty and by extension as inert, the workers were oblivious to the risk posed by smoking near the “empty drums”.”

So what does this have to do with parallel programming? The point here is that when we talk about parallel programs and what is going on under the hood, we have to use programming words (fork, join, lock, synchronize, etc) to describe parallel activities which constrains or limits how we think about the actual parallel activities that we want to happen. Our thinking is limited to the just artificial constructs provided in the programming languages.

Much like Whorf’s Hypothesis, the Avian Hypothesis is that programming language determines the actual program behavior, what can be done, and even how one can think about its actions. By changing our thinking about the parallel activities to more natural scenarios, such as flocks of birds, it becomes easier to think about what we want done without the constraints of what the programming language allows us to think about.

Review of Multiprocessors and Parallel Processing (1974)

DSCN2329I recently came across a book that I had rescued a long time ago from the discard pile of the technical library at work. Titled Multiprocessors and Parallel Processing, it was published in 1974 by John Wiley & Sons, Inc. and edited by Phillip H Enslow. What a trip down memory lane that was! One of the subtle points of the book was how mini-computers (DEC, etc) were encroaching on mainframe computers (Burroughs, IBM, etc). The whole microcomputer revolution that started in the late 1970’s and 1980’s was completely invisible to all but the most radical thinkers and future forecasters.

From a historical perspective, it’s interesting to see what subjects were covered. It begins with an overview of computer systems, including a four-page section describing “The Basic Five-Unit Computer”, including four block diagrams that illustrate the various configurations of the “Input Unit”, the “Arithmetic Logic Unit”, the “Memory Unit”, the “Control Unit”, and the “Output Unit.” I guess it was pretty revolutionary stuff in ’74, but it seems pretty simple stuff compared to what they pack into a dual-core processor these days.

One interesting emphasis in the book was the potential for improved reliability that multiprocessor computer could offer. It’s hard to remember that computer failures were a major source for worry in the ’60s and ’70s. There were many discussions during the Apollo missions about whether the guidance computers should be redundant system, but the added weight, complexity and fragility of the computers of their time made redundant systems unattractive, especially since redundant computers wouldn’t add significantly to the overall reliability. An MIT document on the Apollo Guidance system put the estimated failure rate at 0.01% per thousand hours, so the mission time of 4000 hours produced a “low” probability of success of 0.966, which required that repair components had to be included in every flight along with the tools and procedures to repair their computers during the mission. Our current computers now use components that have less than 1 failure per million hours, or one thousand times more reliable (or more).

One thing that hasn’t changed is the motivation for Multiproccessor and Parallel Processing Systems – improving system performance. Improving performance and reliability are the subjects covered in Chapter 1, along with the basic components and structure of computer systems. Chapter 1 takes 25 pages to cover its material.

Chapter 2, Systems Hardware, spends 55 pages describing memory systems and how to share memory between multiple processors. Also covered were fault tolerance, I/O, interfacing, and designing for reliability and availability. Nearly half of the pages in the first four chapters are in Chapter 2.

Chapter 3 covers Operating Systems and Other System Software in 27 pages. It covers the organization of multiprocessor systems (Master-Slave, Symmetric or Anonymous processors), Resource allocation and management, and special problems for multiprocessor software, such as memory sharing and error recovery.

Chapter 4, Today and the Future, is the last chapter and it summarizes the future of multiprocessor systems in 10 pages or less. The final section in the chapter, The Future of Multiprocessor, includes the statement, “. . . the driving function for multiprocessors would be applications where. . . availability and reliability are principal requirements; however the limited market for special systems of that type would not be enough incentive to develope [sic] a standard product specifically for those applications.” Sounds like they didn’t expect multiprocessor and parallel processing to be a big market in the future.

 

DSCN2331Which brings us to a total of about 130 pages of a 330 page book. The remaining 200 pages of the book are appendices that describe commercially available multiprocessor systems available from Burroughs (D 825 and B 6700), Control Data Corporation (CDC 6500, CYBER-70), Digital Equipment Corporation (DEC 1055 and 1077), Goodyear Aerospace STARTRAN, Honeywell Information Systems (6180 MULTICS System and 6000 series), Hughes Aircraft H4400, IBM (System 360 and 370), RCA Model 215, Sanders Associates OMEN-60, Texas Instruments Advanced Scientific Computer System, Sperry Rand (UNIVAC 1108, 1110, and AN/UYK-7) and Xerox Data Systems SIGMA 9 computers.

For those of you who actually read thru the list, you probably noticed that it included MULTICS system that inspired Thompson and Ritchie to develop Unix. MULTICS may not have been a commercial success, but one of those systems continued to run until the year 2000 when the Canadian Department of National Defence in Halifax, Nova Scotia, Canada finally shut it down. Must have been quite a system if it inspired a couple of people to write a completely different operating system.

What I think is most interesting is that the majority of the book is about the hardware and that system software (and the joys of parallel programming) are only discussed on 27 of the 330 total pages, and most of them discussed how the software interacted with the hardware. At the time, everyone saw the major hurdle as being the hardware, something that integrated circuits and massive shrinking of components has almost rendered irrelevant. The amazing efforts of Intel, AMD, TI, and other chip makers have made high-performance, high-availability, high-reliability hardware available at reasonable prices, components that hardware designers could only (just barely) dream of in the 1970’s.

Now it’s time for software developers to start thinking outside the box and come up with new ways to design software that is faster than the tried-and-true-slow-and-expensive-and-behind-schedule methods currently in use.

Avian Parallel Databases

The top-of-the-line databases are all executing highly optimized code designed to be run on the highest-performing multiple CPUs/multi-core systems. The basic architecture of database software is all similar to operating systems, with protected sections of code that handle multi-core functionality, with lots of consideration and thoughts about how to produce the desired results, data caching, data locality, preventing data staleness, and so on.

So how could any parallel database system compete with the perfection of the top-of-the-line databases? By thinking outside the box and embracing the coming kilo-core future.

In kilo-core systems, it makes sense that each of those kilo-cores would have some local RAM or extended cache. This core-specific memory would prevent huge system delays caused by all of those kilo-cores trying to access the same shared system RAM.

So imagine each core in a 1,000 core system loaded some portion of a database into its local RAM. If each core had 1 MB available local RAM, a 1GB database could all be loaded in high-speed memory at all times, providing a significant speed improvement compared to reading data off a hard disk.

Additional performance improvements would come if each core was responsible for indexing only the data held in its memory and not some shared index. The size of the indexes that each core would have to search would be significantly smaller, producing much faster searches.

Now consider that a 1GB database broken into 1,000 pieces yields data volumes on each core that are small enough that all of the data could be indexed. Imagine retrieving data from any field at indexed speeds. Suddenly all of those off-index queries, such as searching for a name in a comment field, would be completed just as quickly as indexed queries.

Now consider what happens when submitting a SQL statement in a Avian parallel database; the SQL-interpreter bird sings (broadcasts) the SQL request to all 1,000 cores which causes all 1,000 cores to start searching their local RAM for the data that matches the SQL statement. Instead of multiple threads competing for limited system resources, all of the resources are searching simultaneously for the requested data. Instead of preprocessors optimizing requests and scheduling disk reads for optimum results, the data are just searched simultaneously and the matching values are returned.

And perhaps most importantly, the data held in the cores wouldn’t have to be homogeneous. Instead of requiring all “Name” data to be in a Name table that has one exact structure that is defined in the database, the names could come from any name resource, including database tables, XML files, JSON lists, etc. Each core would only have to know how to search it’s own data. All the problems with field sizes and data consistency would all go away.

Now imagine the capabilities of a Mega-core system running a parallel database. Even with modest local RAM, we’re talking a terabyte database that could return results on any field in relatively short times. The mind boggles at the potential.

The ideas for an Avian Parallel Database presented here illustrate how using massively parallel systems should produce significantly improved performance compared to traditional databases whenever more than 1 piece of data is required. Retrieving the name of the customer with ID = 12345 would not be faster, but retrieving the names of all customers who meet certain criteria should be significantly faster because thousands (or eventually millions) of data sources could be searched simultaneously.

Avian Parallel Operating Systems

One interesting potential of Avian Computing is its loosely decoupled nature. On occasion, I’ve compared an Avian program to a flock of birds landing in a field and looking for food and when they’ve found all their food (the program ends), they all fly away. But do all of them have to fly away?

What if the really basic birds didn’t fly away? What if the birds that handled keystrokes and serial ports and network communications and all the really low-level stuff remained in place? What if all of the birds that handled all the basic system behaviors remained active in memory? Suddenly, the single-threaded and highly protected kernel would disappear and all of its capabilities/functionality could be handled in parallel.

If you imagine a system with 10,000 cores, how would you process everything thru a single-threaded kernel? Or thru a tightly controlled multi-threaded kernel? Certainly they couldn’t all stop and wait for the kernel processes to complete before they continue. If we are going to derive any benefits from moving to kilo-core systems, we will need different operating system architectures to get those benefits.

What if the birds (cores) that handled system events all worked thru a standardized process, like the TupleTree(s)? All of these activities could work simply and asynchronously, without a lot of planning or coordination. It would provide a structure that enables highly parallel processing of operating system activities.

And if the “operating system” birds were auto-adjusting, like they are in Avian Computing, whenever a particular OS service needed additional processing power, the bird(s) responsible for that OS service could clone itself as many times as required to meet the need or hatch the appropriate kinds of birds.

Instead of compiling a single kernel that should handle most situations, a flock of OS birds would “fly down and land” into the computing hardware and grown/shrink to match the usage encountered for that individual system. Adding new features and capabilities to a system would be a relatively simple process of releasing a new bird with no need to install new drivers and rebooting or recompiling the kernel and then rebooting.

User-level software would have no need to know the details of how keystrokes or ethernet inputs were gathered; they would simply attempt to get the info they need from the TupleTree(s). The providing layers would look in the TupleTree(s) for the food that they need, and when they eat it, they’d store it back in the TupleTree(s) for the user-level software. The providing layers would eat food that was put in the TupleTree(s) by the operating system birds. It would be consistent from the lowest levels to the highest levels.

Security measures would need to be developed so individual systems aren’t compromised and infected with viruses and unwanted processes, but these could be developed in advance of releasing an Avian Operating System. And any updates to the kernel could be handled dynamically so there would never be a need to reboot.

One of the reasons why I am personally optimistic about Avian Computing is because it’s potential extends far beyond what can be realized in a single parallel program. Avian Computing extends parallel computing so it could include operating system functionality, so the same ideas and concepts would describe not just individual programs but all capabilities within a system. An Avian Computing Operating System would effectively be a “fractal system”, where the same behavior at the lowest level would be repeated at each higher level. It would be consistent from top to bottom, completely parallel and completely asynchronous.

It is chaos. It is unstructured and unpredictable. It behaves like real life. What’s not to love.

How Many Cores will be Available in the Future?

One of the baseline assumptions of Avian Computing Project is that the number of cores and processors available for a programming task will increase rapidly. Here’s an article published by ZDNet’s Nick Heath that supports that assumption.

Cracking the 1,000-Core Processor Power Challenge
ZDNet (05/21/13) Nick Heath

University researchers in the United Kingdom are working on solutions to the growing problem of power consumption as mainstream processors are expected to contain hundreds of cores in the near future. <<emphasis added>> Power consumption outpaces performance gains when additional cores are added to processors so that, for example, a 16-core processor in an average smartphone would cut the maximum battery life to three hours. In addition to mobile devices, data centers crammed with server clusters face mounting energy demands due to the rising number of cloud services.

Left unchecked, the power consumption issue within three processor generations will require central-processing unit (CPU) designs that use as little as 50 percent of their circuitry at one time, to restrict energy use and waste heat that would ruin the chip.

The University of Southampton is part of a group of universities and companies joining in the Power-efficient, Reliable, Many-core Embedded systems (PRiME) project to explore ways that processors, operating systems, and applications could be redesigned to enable CPUs to intelligently pair power consumption with specific applications. PRiME is studying a dynamic power management model in which processors work with the operating system kernel to shut down parts of cores or modify the CPU’s clock speed and voltage based on exact application needs.

View the original article

Hello Parallel World

Thank you for visiting the Avian Computing blog, a blog dedicated to improving how we think about parallel programming. The current ways of thinking about parallel programming are ineffective and inefficient because they fail to capitalize on the strengths of human thinking and fail to leverage the strengths of computers. These deficiencies result in parallel programs that are slow to develop, are difficult to debug,  show unpredictable performance, and contain potential run-time failures that may occur only intermittently .

This blog will look at many of the issues associated with parallel programming and will try to provide new perspectives on solving these issues, specifically keeping in mind the strengths (and weaknesses) of the human mind.

Currently, we attempt to solve our parallel programming problems using the tools and techniques developed to create single-threaded programs and then attempt to brute-force them into parallel programs with the application of “pure logic” and our massive intellect.  This approach is no more effective today than it was 50 years ago. This blog will search for more effective solutions for the rapid development of parallel programs, primarily by using the Concurrency Explorer (ConcX), available soon on this web site. This open-source software should not be confused with Microsoft’s ConcurrencyExplorer (no space between words) that used on their CHESS system.

An underlying assumption of this blog is that 1,000-core systems and 10,000 core systems are in our near future. And to be able to use these centi-core and kilo-core systems, we need a better way of generating parallel programs. Currently, we develop parallel programs the old fashioned way, with “blood, sweat, toil, and tears”. The goal of this blog is to investigate how to more efficiently develop software that will run on these kilo-core systems that will be available in our near future.