Concurrency in Go by Example

Even in Go, concurrency is hard. Even with Go’s well-behaved, scalable, cross-platform, simple concurrency primitives, writing correct concurrent code requires care and attention to detail. Over the course of the past year, I’ve had the pleasure of being paid to teaching concurrency to professional engineers during one-on-one pair programming sessions. As an attempt to translate the synchronous experience of teaching mentees into the modern world of remote work and async communication, I’m planning a series of posts on how to use Concurrency in Go. My goal is that a reader will be able to learn how to use Concurrency in Go by following along with these posts.

This post is just introducing that series. It’s an outline to hold myself accountable today and serve as a terse, skimmable intro tomorrow, for those who have already shipped concurrent Go code to production.

I also plan to steal Bob Nystrom’s approach from Crafting Interpreters, by building each program up from nothing via a series of incremental changes. Unlike Bob, I’ll be throwing in several wrong turns – that’s right, I’m going to write bugs! I could only provide correct examples to learn from, but concurrency is hard, and the reality is that mistakes are the norm. I’ve found that it’s more important to understand why nearly right answers are wrong than it is to be able to provide the correct solution. Of course, I’ll also explain what goes wrong and how to fix it along the way.

Below is a rough list of planned posts and a brief summary of what each one will cover. This list is subject to change until the entire series is completed – but probably not by much.

Thinking About Time

Concurrency is hard because thinking about concurrent threads of execution defies our human perceptions about how time works. In a single-threaded Go program, time flows linearly; as you read down the lines of a func, your eyes trace roughly the same path as the program counter stepping through the assembly it outputs. This intuition breaks in Go whenever multiple goroutines are involved (and they usually are).

Though it should be tested, concurrent code cannot be tested for correctness. The only way to know it works is to prove it. That said, nobody’s goes around providing formal proofs of correctness for their code. They learn how to take mental shortcuts. Leslie Lamport invented the ‘happened-before’ relationship as a shortcut to make concurrency easier to reason about. It maps nicely to Go’s programming primitives and its memory model so I’ll introduce how to use this idea to make your own concurrent code easier to think about.

Goroutines

Go’s runtime provides a lightweight implementation of green threading. When a func is executed using the go keyword, the Go runtime starts a new stack and manages its scheduling onto OS-level threads. I’ll spend a post explaining what this does and the guarantees that the compiler provides.

Unbuffered Channels

Go provides unbuffered channels as a simple mechanism for communicating between two running goroutines. The Go runtime is optimized for the use of channels; every channel operation provides a point in time which the runtime can use to switch to another running goroutine.

Buffered Channels and Low-level Concurrency

On occasion, performance of a queuing system can be improved through the use of buffers. The implementation of buffered channels is also simple and elegant. Implementing toy buffered channels from scratch can be achieved using primitives from the sync package. Along the way we’ll also get to demonstrate the use of low-level concurrency primitives Go provides.

Data Races and select

Parallelism involves the potential for data races, as one goroutine tries to read or write the same heap locations accessed by another. Go provides a simple language primitive for managing data races via the select keyword. It also provides runtime testing for data races via its race detector, which we’ll show how to use.

Fan-out and sync.WaitGroup

One fundamental concurrency pattern is “fanning-out”; taking a stream of work and sending it to multiple goroutines. Correctly shutting down a pool of running goroutines requires the use of a sync.WaitGroup.

Timeouts and context.Context

Every network RPC in Go is performed using low-level socket primitives. In case of the failure of a remote server, an unwitting goroutine might be left waiting forever for a connection that will never arrive. Go added context.Context as a uniform, API-agnostic way to carry cancellation signals, timeouts, and deadlines across API boundaries. I’ll show how to use this struct to handle cancellation and timeouts properly.

Fan-in

After fan-out has been demonstrated, I’ve usually asked mentees to attempt to write a fan-in program as an exercise. I’ll ask the reader to do that as well, then walk through a correct solution, step-by-step.

Map-Reduce

Combine the fan-out pattern with the fan-in pattern, and you have all the concurrency needed to write an in-memory, toy simulation of a “real” distributed system based on the ‘Map-Reduce’ algorithm, popular in big data processing. We’ll step through an example and call out the big reasons “why” the code works, with emphasis on analyzing the code using the happened-before relationship.

I’ll also emphasize a few Go best-practices in this section: 1) I’ll explain why the sending goroutine is always responsible for closing a channel, 2) I’ll demonstrate how channel close events can be chained together to gracefully tear down a large system of interconnected goroutines.

Pipelining I/O-bound Workloads

In practice, most Go programs in a distributed system at scale are I/O bound – most time is spent waiting for a remote server to finish performing some work or receive some message. Pipelining combines with functional programming to create a nice pattern for such workloads. It’s been applied over and over again in stream processing frameworks found in other languages.

We’ll motivate and implement one of our own from scratch by completing the post series that I already started. At this point we’ll focus on applying concepts we’ve already learned and mixing in the use of Go generics for additional fun.

Conclusion

As of this writing, I plan each of the above topics as a separate post – but that could change. I’ve taken several mentees through the above course of exercises successfully. Once it’s written, I hope that anyone who takes the time to grok the material above will be well-equipped to write their own concurrent Go programs while avoiding major footguns found along the way.

Thinking About Time#

Goroutines#

Unbuffered Channels#

Buffered Channels and Low-level Concurrency#

Data Races and select#

Fan-out and sync.WaitGroup#

Timeouts and context.Context#

Fan-in#

Map-Reduce#

Pipelining I/O-bound Workloads#

Conclusion#