Timer Routines And Graceful Shutdowns In Go

Sep 4, 2013


In my Outcast data server I have several data retrieval jobs that run using different go routines. Each routine wakes up on a set interval. The most complex job is the downloading of radar images. What makes this complex is that there are 155 radar stations throughout the United States that take a new picture every 120 seconds. All these radar images can be put together to create a mosaic. When the go routine wakes up to pull down the new images, it must do this as quickly as possible for all 155 stations. If it doesn’t, the mosaics will be out of sync and any overlays across station boundaries will look off.


The radar image on the left is for Tampa Bay at 4:51 PM EST. You can see the coverage of that radar station crosses over a large area of the state of Florida. This radar image actually cuts into several other radar stations including Miami.

The radar image on the right is for Miami at 4:53 PM EST. There is a two minute difference, or what I call glare, between these two radar images. When we overlay these two images on a map you would not notice any difference. However, if the glare between these images get any greater than a couple of minutes, it can become obvious to the naked eye.

The blue colors are radar noise that gets filtered out, so we are left with greens, reds and yellows that represent real weather. These images were downloaded and cleaned at 4:46 PM EST. You can see they are pretty close and would overlay well.

The first implementation of the code used a single go routine on a 10 minute interval. When the go routine woke up it would take 3 to 4 minutes to download, process, store and write a record to mongo for all 155 stations. Even though I would process each region as close together as possible, the glare between the images was too great. The radar stations already contain a glare of one to two minutes so adding another one to two minutes more presented a problem.

I always try to use a single routine if I can for any work that needs to be performed, just to keep things simple. In this case one routine didn’t work. I needed to process multiple stations at the same time and reduce the amount of glare between the images. After adding a work pool to process multiple stations at once, I was able to process all 155 stations in under a minute. So far I have received no complaints from the client team.

In this post we are going to concentrate on the timer routine and shutdown code. In the next post I will show you how add a work pool to the solution.

I have attempted to provide a complete working code sample. It should work as a good template for your own implementations. To download and run the code, open a terminal session and issue the following commands:

cd $HOME
export GOPATH=$HOME/example
go get github.com/goinggo/timerdesignpattern
cd example/bin
./timerdesignpattern

The Outcast data server is a single application that is started and hopefully runs for long period of time. Occasionally these types of applications do have to be shut down. It is important that you can always shut down your application gracefully on demand. When I am developing these types of applications, I always make sure, right from the beginning, I can signal the application to terminate and it does without hanging. There is nothing worse than an application that you need to kill by force.

The sample program creates a single go routine and tells the routine to wake up every 15 seconds. When the routine wakes up, it performs 10 seconds of work. When the work is over, it calculates the amount of time it needs to sleep so it can wake up on that 15 second cycle again.

Let’s run the application and shut it down while it is running. Then we can learn how it all works. We can shutdown the program by hitting the enter key at any time.

Here is the program running and being shut down 7 seconds later:

2013-09-04T18:58:45.505 : main : main : Starting Program
2013-09-04T18:58:45.505 : main : workmanager.Startup : Started
2013-09-04T18:58:45.505 : main : workmanager.Startup : Completed
2013-09-04T18:58:45.505 : WorkTimer : _WorkManager.GoRoutine_WorkTimer : Started
2013-09-04T18:58:45.505 : WorkTimer : _WorkManager.GoRoutine_WorkTimer : Info : Wait To Run : Seconds[15]

2013-09-04T18:58:52.666 : main : workmanager.Shutdown : Started
2013-09-04T18:58:52.666 : main : workmanager.Shutdown : Info : Shutting Down
2013-09-04T18:58:52.666 : main : workmanager.Shutdown : Info : Shutting Down Work Timer
2013-09-04T18:58:52.666 : WorkTimer : _WorkManager.GoRoutine_WorkTimer : Shutting Down
2013-09-04T18:58:52.666 : main : workmanager.Shutdown : Completed
2013-09-04T18:58:52.666 : main : main : Program Complete

This is a great first test. As soon as we tell the program to shutdown, it does and gracefully. Next let’s have the program start it’s work and try to shut it down:

2013-09-04T19:14:21.312 : main : main : Starting Program
2013-09-04T19:14:21.312 : main : workmanager.Startup : Started
2013-09-04T19:14:21.312 : main : workmanager.Startup : Completed
2013-09-04T19:14:21.312 : WorkTimer : _WorkManager.GoRoutine_WorkTimer : Started
2013-09-04T19:14:21.313 : WorkTimer : _WorkManager.GoRoutine_WorkTimer : Info : Wait To Run : Seconds[15]
2013-09-04T19:14:36.313 : WorkTimer : _WorkManager.GoRoutine_WorkTimer : Woke Up
2013-09-04T19:14:36.313 : WorkTimer : _WorkManager.GoRoutine_WorkTimer : Started
2013-09-04T19:14:36.313 : WorkTimer : _WorkManager.GoRoutine_WorkTimer : Processing Images For Station : 0
2013-09-04T19:14:36.564 : WorkTimer : _WorkManager.GoRoutine_WorkTimer : Processing Images For Station : 1
2013-09-04T19:14:36.815 : WorkTimer : _WorkManager.GoRoutine_WorkTimer : Processing Images For Station : 2
2013-09-04T19:14:37.065 : WorkTimer : _WorkManager.GoRoutine_WorkTimer : Processing Images For Station : 3

2013-09-04T19:14:37.129 : main : workmanager.Shutdown : Started
2013-09-04T19:14:37.129 : main : workmanager.Shutdown : Info : Shutting Down
2013-09-04T19:14:37.129 : main : workmanager.Shutdown : Info : Shutting Down Work Timer
2013-09-04T19:14:37.315 : WorkTimer : _WorkManager.GoRoutine_WorkTimer : Info : Request To Shutdown
2013-09-04T19:14:37.315 : WorkTimer : _WorkManager.GoRoutine_WorkTimer : Info : Wait To Run : Seconds[14]
2013-09-04T19:14:37.315 : WorkTimer : _WorkManager.GoRoutine_WorkTimer : Shutting Down
2013-09-04T19:14:37.316 : main : workmanager.Shutdown : Completed
2013-09-04T19:14:37.316 : main : main : Program Complete

This time I waited the 15 seconds and let the work begin. After it finished processing the forth image, I told the program to shutdown. It did so immediately and gracefully.

Let’s look at the core piece of the code that implement the timer routine and the graceful shutdown:

func (wm *WorkManager) WorkTimer() {
    for {
        select {
        case <-wm.ShutdownChannel:
            wm.ShutdownChannel <- "Down"
            return

        case <-time.After(TimerPeriod):
            break
        }

        startTime := time.Now()
        wm.PerformTheWork()
        endTime := time.Now()

        duration := endTime.Sub(startTime)
        wait = TimerPeriod - duration
    }
}

I have removed all the comments and logging to make it easier to read. This is classic channels at work and the solution is really elegant. Elegant compared to how something like this needs to be implemented in C#.

The WorkTimer function runs as a Go Routine and is started with the keyword go:

func Startup() {
    wm = WorkManager{
        Shutdown: false,
        ShutdownChannel: make(chan string),
    }

    go wm.WorkTimer()
}

The WorkManager is created as a singleton and then the timer routine is started. There is a single channel for shutting down the timer routine and a flag to denote when the system is shutting down.

The timer routine runs inside an endless for loop so it does not terminate until we ask it to. Let’s look at the channel related code inside the for loop:

select {
case <-wm.ShutdownChannel:
    wm.ShutdownChannel <- "Down"
    return

case <-time.After(TimerPeriod):
    break
}

wm.PerformTheWork()

We are using a special keyword called select. Here is Go documentation on the keyword select:

http://golang.org/ref/spec#Select_statements

We are using the select statement to keep the timer routine asleep until it is time to perform work or time to shut down. The select puts the timer routine to sleep until one of the channels are signaled. Only one case will execute at a time, making the code synchronous. This really helps keep things simple and allows us to run atomic, "routine safe", operations across the multiple channels cased inside the select.

The select in the timer routine contains two channels, one for shutting down the routine and one for performing the work. Shutting down the routine is performed by the following code:

func Shutdown() {
    wm.Shutdown = true

    wm.ShutdownChannel <- "Down"
    <-wm.ShutdownChannel

    close(wm.ShutdownChannel)
}

When it is time to shut down, we set the ShutDown flag to true and then signal the ShutDownChannel by passing the string "Down" through the channel. Then we wait for a response back from the timer routine. This communication of data synchronizes the entire shutdown process between the main routine and the timer routine. Really nice, simple and elegant.

To wake up on an interval using the select statement, I use a special function called time.After. This function waits for the specified duration to elapse and then returns the current time on a signaled channel. This wakes up the select allowing the PerformTheWork function to be executed. Once the PerformTheWork function returns, the timer routine is put back to sleep by the select statement again, unless another channel is in the signaled state.

Let’s look at the PerformTheWork function:

func (wm *_WorkManager) PerformTheWork() {
    for count := 0; count < 40; count++ {
        if wm.Shutdown == true {
            return
        }

        fmt.Println("Processing Images For Station:", count)
        time.Sleep(time.Millisecond * 250)
    }
}

The function is printing a message to the console window 40 times every 250 milliseconds. This will take 10 seconds to complete. Within the loop the code is checking the Shutdown flag. It is really important for this function to terminate quickly if the program is shutting down. We don’t want the admin who is shutting the program down to think the program has hung.

Once the function terminates, the timer routine can execute the select statement again. If the program is shutting down, the select will immediately wake up again to process the signaled Shutdown channel. From there the timer routine signals back to the main routine that it is shutting down and the program terminates gracefully.

This is my timer and graceful shutdown code pattern that you can also use in your applications. If you download the full example from the GoingGo repository, you can see the code in action and a few more goodies.

Read this post to learn how to implement a work pool to process work across multiple go routines. Like in the radar image processing I described above.

http://www.goinggo.net/2013/09/pool-go-routines-to-process-task.html