Monday, March 28, 2011

Backtracking: Difficulties in Concurrency

Sorry for the delay, folks. I had intended last week to be supremely productive, and while I made certain advances that I hope to discuss soon, none of them were directly in line with my initial intentions. That said, it's probably time to get back to explaining in more detail what concurrency and parallelism are and why this is an issue worth reading about. In my previous Backtracking post, I explained briefly what is meant by concurrency in the field of computation and why it's becoming a more prominent issue in recent years. Now it's time to cover why the problem is difficult.

In slightly simplified terms, concurrency comes chiefly in two flavors at its lowest level. There is distributed memory concurrency and there is shared memory concurrency. These are centrally understood by two similar but distinct concepts: processes and threads.

Both threads and processes represent executing portions of a computer program. In simple terms, though, a process is effectively "bigger". It gets its own unique chunk of memory on the system and it gets its own slot in the operating system's scheduling system for deciding when which programs run. A thread lives within a process, meaning all of the threads that a process creates share that process' share of the memory and its CPU time.

As you may suspect from the terms "shared memory" and "distributed memory" that bit about threads living inside a process and having access to the same bit of memory is important. It has an essential impact on the way information is communicated between the different running parts of a program.

You see, a process isn't even aware that any memory exists outside of the bit it has access to. If it was, then it would be able to write information there and there would be a very real chance that your computer would crash because your browser unwittingly overwrote a piece of data that your operating system was currently using. So, when two processes need to communicate information, the system needs to actively set up channels of communication through which one process can see some tiny reserved area of the memory reserved for the second process. The first process then places whatever data it is trying to send in that space from which the second process can retrieve it.

Setting up these communication channels is costly, and there are other difficulties that arise when distinct processes need to be synchronized (i.e. if there are points where one process may need to wait for another one before moving on). When threads need to "communicate", however, they don't have to go through any of that nonsense. Since all threads share the same memory space, each one can, given the particular location, see anything that its siblings can without having to request permission or have the desire to share the data explicitly communicated. Instead, one thread writes information and another thread reads it if necessary without any other threads needing to be aware.

This saves all sorts of time and system resources, but it also creates problems. Remember how I mentioned that one process being aware of the entire memory space of the computer could potentially make your web browser kill your operating system? Well, now we have a situation where one thread can kill an entire process (or program) by reading or writing data at the wrong time. This can occur in a number of ways, but we'll only worry about the two most common.

First is something called race conditions. This occurs when two threads are trying to modify the same memory location for different purposes. Say thread A needs to write to memory location X and then retrieve data from X a little bit later. In the mean time, though, thread B writes to X. Instead of getting the value it expected, A gets the value that B wrote, possibly corrupting the result it was trying to compute.

This sort of problem can be solved by what are called locks. The basic idea is that thread A places a restriction on a chunk of memory (or a segment of code), and then no other thread can touch it until A says that it's done with whatever sensitive computation it was doing. This, however, creates more problems. Suppose we have two sensitive memory areas, X and Y, and two threads, A and B. Now, suppose B has a lock on X, but needs access to Y to finish its computation while A has a lock on Y but needs access to X. Neither can finish their computation, so neither can let go of their current lock, and the program will halt. This is called deadlock.

All of this is tricky business, and it only becomes trickier as you seek to optimize performance and fairness within the system. It's so easy to simply overlook possible trouble spots that asking programmers to handle all of this by themselves, even with helpful tools for synchronizing and monitoring their threads, is daunting. This is why we need simpler ways of understanding shared memory concurrency that solve and mitigate these problems with minimal added input from programmers.

No comments:

Post a Comment