Problems in Parallelism: March 2011

Monday, March 28, 2011

Backtracking: Difficulties in Concurrency

Sorry for the delay, folks. I had intended last week to be supremely productive, and while I made certain advances that I hope to discuss soon, none of them were directly in line with my initial intentions. That said, it's probably time to get back to explaining in more detail what concurrency and parallelism are and why this is an issue worth reading about. In my previous Backtracking post, I explained briefly what is meant by concurrency in the field of computation and why it's becoming a more prominent issue in recent years. Now it's time to cover why the problem is difficult.

In slightly simplified terms, concurrency comes chiefly in two flavors at its lowest level. There is distributed memory concurrency and there is shared memory concurrency. These are centrally understood by two similar but distinct concepts: processes and threads.

Both threads and processes represent executing portions of a computer program. In simple terms, though, a process is effectively "bigger". It gets its own unique chunk of memory on the system and it gets its own slot in the operating system's scheduling system for deciding when which programs run. A thread lives within a process, meaning all of the threads that a process creates share that process' share of the memory and its CPU time.

As you may suspect from the terms "shared memory" and "distributed memory" that bit about threads living inside a process and having access to the same bit of memory is important. It has an essential impact on the way information is communicated between the different running parts of a program.

You see, a process isn't even aware that any memory exists outside of the bit it has access to. If it was, then it would be able to write information there and there would be a very real chance that your computer would crash because your browser unwittingly overwrote a piece of data that your operating system was currently using. So, when two processes need to communicate information, the system needs to actively set up channels of communication through which one process can see some tiny reserved area of the memory reserved for the second process. The first process then places whatever data it is trying to send in that space from which the second process can retrieve it.

Setting up these communication channels is costly, and there are other difficulties that arise when distinct processes need to be synchronized (i.e. if there are points where one process may need to wait for another one before moving on). When threads need to "communicate", however, they don't have to go through any of that nonsense. Since all threads share the same memory space, each one can, given the particular location, see anything that its siblings can without having to request permission or have the desire to share the data explicitly communicated. Instead, one thread writes information and another thread reads it if necessary without any other threads needing to be aware.

This saves all sorts of time and system resources, but it also creates problems. Remember how I mentioned that one process being aware of the entire memory space of the computer could potentially make your web browser kill your operating system? Well, now we have a situation where one thread can kill an entire process (or program) by reading or writing data at the wrong time. This can occur in a number of ways, but we'll only worry about the two most common.

First is something called race conditions. This occurs when two threads are trying to modify the same memory location for different purposes. Say thread A needs to write to memory location X and then retrieve data from X a little bit later. In the mean time, though, thread B writes to X. Instead of getting the value it expected, A gets the value that B wrote, possibly corrupting the result it was trying to compute.

This sort of problem can be solved by what are called locks. The basic idea is that thread A places a restriction on a chunk of memory (or a segment of code), and then no other thread can touch it until A says that it's done with whatever sensitive computation it was doing. This, however, creates more problems. Suppose we have two sensitive memory areas, X and Y, and two threads, A and B. Now, suppose B has a lock on X, but needs access to Y to finish its computation while A has a lock on Y but needs access to X. Neither can finish their computation, so neither can let go of their current lock, and the program will halt. This is called deadlock.

All of this is tricky business, and it only becomes trickier as you seek to optimize performance and fairness within the system. It's so easy to simply overlook possible trouble spots that asking programmers to handle all of this by themselves, even with helpful tools for synchronizing and monitoring their threads, is daunting. This is why we need simpler ways of understanding shared memory concurrency that solve and mitigate these problems with minimal added input from programmers.

Sunday, March 6, 2011

Backtracking: What is Concurrency?

Update: light edits as of 2/28/2011

I kind of jumped in on the middle of my work with my first couple of posts. I suspect that there are some individuals who are not experienced Computer Scientists may find interest in what I'm doing -- there's at least one already following me according to Blogger. I'm firmly of the belief that any academic who cannot explain more than the surface details of his/her work to those without domain specific knowledge is a blight upon the earth. I'd rather not be a blight, and I'd also rather not leave anybody entirely alienated, so I'm going to start trying to explain the basic ideas of what it is I'm dealing with and what my larger goals are over a series of posts in the not too distant future. Starting now.

When Computer Scientists discuss concurrency, they are talking about the "simultaneous" performance of multiple tasks at once. I use quotation marks there because such simultaneous performance requires hardware capable of processing multiple instructions at once -- i.e. hardware containing multiple processing units. Historically, this has been limited mostly to so-called "super computers" used in scientific computation. As a result, consumer computers (or, technically, operating systems) kept a schedule of all currently running tasks and rapidly alternated between them, giving each a regular chunk of computation time. Because of the simulated nature of the simultaneous operation, there were only a handful of especially useful applications for concurrency in most consumer level products:

Running multiple distinct programs at once.
Retrieving data from peripherals attached to the computer system.
Handling user interaction.

The first use and most instances of the second are handled almost entirely by the operating system, leaving programmers to only worry about specialized, application-specific uses of concurrency. Times have changed, though, and as we are all well aware, even the lowliest laptop currently in production runs with at least two "cores" or processing units. If you want to break the bank, Apple will sell you a tower containing as many as 16 such cores (or, alternatively, you can build one yourself for a fairly reasonable fee). Within a hardware generation or two, even smartphones will be multi-core.

While users certainly see some benefit from this development in the ability to more efficiently run multiple programs simultaneously, the generally lower clock-speeds of multi-core processors can actually hinder the performance of a single application if it isn't written with multiple cores in mind. Suddenly, we live in a world where developers of consumer sorts of applications can generate performance increases by utilizing multiple processing units in a computer system to "parallelize" their execution. As I will discuss in a future post, this is a complicated task for a number of reasons.

Thursday, March 3, 2011

First Up: Actors

As I begin my long trek towards becoming an expert in topics of concurrency, my first stop is going to be Actor Models. Erlang was one of the first languages to really make me consider the value of message passing outside of clunky models such as MPI and to think about how languages could build concurrency right in at a fundamental level. Unfortunately, I never had much excuse before now to sit and play with Erlang as much as I would have liked. So, starting slowly over the next couple of weeks (which are littered with distractions from coursework) and then picking up heavily over Spring Break, I'm going to be reading plenty of papers on Actors and The Inheritance Anomaly as well as trying to design some non-trivial, concurrent program in Erlang (perhaps some sort of YAWS style webserver). Expect a healthy decompressing of Carl Hewitt's foundational "A Universal Modular Actor Formalism for Artificial Intelligence" sometime in the coming week.

Historically, Actors have been used in distributed systems, but I feel like the semantics and the fundamental idea may have value elsewhere. We'll see.

Establishing Metaphors for Shared Memory Concurrency

Computer programmers like to use real-world analogies to simplify the task of explaining how a given idea works or to champion the notion that their theory is the most practical for modeling the real world.

This is a trend perhaps most famous in the Object Oriented paradigm, where you can't open a text-book without being told that an Employee IS A Person, and a SalariedEmployee IS AN Employee, as is an HourlyEmployee. These canonical examples aren't always perfect (I'd argue that a sub-class relationship between and Employee and a Person is inappropriate), but they get the point across clearly in describing the basics of inheritance.

Another paradigm that tends to champion its own set of metaphors is Concurrency Oriented Programming, perhaps most visibly represented by Joe Armstrong and his Erlang programming language. Armstrong is quite fond of the analogy of a program as a sort of highway, with individual cars autonomously sending messages to and responding to messages from other cars based on localized rules. It only makes sense, then, that those of us looking to improve the usability of constructs for shared memory concurrency would also like to have our own metaphors.

Consider, for instance, our initial plan to use trains as an analogy for shared memory concurrency. The obvious relationships are that trains are threads of execution and tracks are code space. We also have switches and signals that can provide direction the way simple synchronization constructs like locks do. Two trains can run on parallel tracks (executing the same, non-critical block of code simultaneously) or can be running in completely different regions of the track space (executing different regions of code). If two trains are running on parallel tracks that suddenly cut down to a single lane, then that would obviously seem to be a critical section governed by a switch/signal...but this is where the problems start to come in. Why does this section of track need to be only a single lane? What is being modified that would break if there were two trains continuing to run side by side?

"Perhaps", you say, "there's a bridge that can only support the weight of a single train." This works here, when the trains are executing the same chunk of code, but what about the example of, say, a mutator and an accessor for a single state variable being called at the same time? How do we explain the notion that Train A has to stop and wait for Train B, miles and miles away on a completely distinct length of track, to finish crossing some seemingly arbitrary boundary before Train A can continue along its route?

The problem is that, unlike distributed memory forms of parallelism such as actor models, where each process has its own unique hunk of memory that only it can touch and where explicit communication is required to share data with other processes, shared memory requires us to be sure that our analogy contains some explicit way of representing memory. Otherwise, we run into problems like those above. This forces us to separate the address space utilized by all threads in a given process and the "code space" they traverse over time as distinct (though adjacent) spaces. If we try to limit ourselves to the code space and the agents moving through it, then we run into a situation where we need our limited set of representatives must exhibit either multilocation -- that is to say, the ability for something to be in multiple "places" simultaneously -- or co-location -- the potential for two "different" things to occupy the same space -- depending on how we want to represent the sorts of simultaneous memory accesses we want to avoid. Any metaphysician worth their salt will tell you that this is absurd if you're looking to represent anything resembling reality.

There are certainly some metaphors for which this doesn't cause a problem, they just aren't as immediately attention grabbing as those that have already been suggested. I've long been fond of this example I came up with on 2 hours of sleep in the middle of a presentation years ago:

Assume you live with another person, and, being kind of stupid (like computers), you refuse to freely stock your fridge with whatever you buy at the store. Instead, you need to place your groceries in containers that always sit in your fridge and are marked for specific purposes, but not based on their current contents. Now, suppose you come home one day with a nice carton of orange juice. You take one of the containers marked "drinks" and you dump out its current contents, fill it with the orange juice you just bought, and wander off happily thinking of all the enjoyment you'll get from consuming it later.

Now, before you get back to your orange juice-y revelry, your roommate shows up with a nice bottle of freshly stilled moonshine. Unaware of your intentions, he dumps out your orange juice and fills up the container it once inhabited. You walk into the kitchen, take out the container now filled with potent, put-hair-on-your-chest booze, chug the whole container, and wind up in the hospital getting your stomach pumped.

Boom, race conditions.

Perhaps, then, a fitting analogy for shared memory concurrency is a busy kitchen with a multitude of chefs trying to all get their meals out on time and their ingredients in place. We have threads (the chefs themselves) executing code (going about various kitchen-y tasks) and accessing shared memory (refrigerators, counter tops, ovens, and whatever else) containing multiple possible values (the food items therein).

This analogy, unfortunately, doesn't lend itself as well to traditional, low level abstractions for thread scheduling, as there is nothing external to the actors that governs them with set policies. What use is it, then? Perhaps my biases are skewing my judgement, but, for one thing, it seems to have potential for describing the sorts of shared memory "message passing" in Object Oriented Programming that I've discussed implementing in the past.

The Purpose

I'm a person who needs a lot of time to ruminate and process ideas before taking action on them. Hammering out semantics, formulating relationships between concepts, and diving into the implications of any plan of action are all essential aspects of accomplishing any large goal for me. Unfortunately, I'm good at letting myself dwell on a single area of interest for longer than is productive as I try to unnecessarily refine my understanding beyond the point where it's useful or to a level that my current broader knowledge-base cannot support.

History has shown that writing about such things not only helps me connect ideas more quickly, but it helps me move past them and onto the next piece of the puzzle. That's the purpose of this blog. Here, I will hold myself accountable to write about my ongoing literature reviews on the topics of concurrency and of human comprehensibility of programming languages and constructs. I will keep an ongoing list of the topics I'm focusing on and the specific papers in the pipe. I will process what I am learning and how I respond to it. I will discuss how these new bits of knowledge relate to and cause my to alter my larger goals.

In an ideal world, this blog will serve as a resource for me in reducing the amount of time I spend on a single topic by allowing me to regularly revisit and refine my thoughts over time as I learn more. Further, it will keep my advisors in the loop as to my work and the progress that I am making, and, hopefully, it can eventually be a resource to others trying to get a grasp on some of these topics.