Running MongoDB Queries Concurrently With Go

Feb 25, 2014


If you are attending GopherCon 2014 or plan to watch the videos once they are released, this article will prepare you for the talk by Gustavo Niemeyer and Steve Francia. It provides a beginners view for using the Go mgo driver against a MongoDB database. 

Introduction
MongoDB supports many different programming languages thanks to a great set of drivers. One such driver is the MongoDB Go driver which is called mgo. This driver has been externally developed by Gustavo Niemeyer from Canonical, and eventually Steve Francia, the head of the drivers team at MongoDB Inc, took notice and offered support. Both Gustavo and Steve will be talking at GopherCon 2014 in April about "Painless Data Storage With MongoDB and Go". The talk centers around the mgo driver and how MongoDB and Go really work well together to build highly scalable and concurrent software.

MongoDB and Go let us build scalable software on many different operating systems and architectures, without the need to install any frameworks or runtime environments. Go programs are native binaries and the Go tooling is constantly improving to create binaries that run as fast as equivalent C programs. That wouldn’t mean anything if writing code in Go was complicated and as tedious as writing programs in C. This is where Go really shines because once you get up to speed, writing programs in Go is fast and fun.

In this post I am going to show you how to write a Go program using the mgo driver to connect and run queries concurrently against a MongoDB database. I will break down the sample code and explain a few things that seem to be always be a bit confusing to those new to MongoDB and Go.

Sample Program
The sample program connects to a public MongoDB database I have hosted with MongoLab. If you have Go and Bazaar installed on your machine, you can run the program. The program launches ten goroutines that individually query all the records from the buoy_stations collection inside the goinggo database. The records are unmarshaled into native Go types and each goroutine logs the number of documents returned:

Now that you have seen the entire program, we can break it down. Let’s start with the type structures that are defined in the beginning:

The structures represent the data that we are going to retrieve and unmarshal from our query. BuoyStation represents the main document and BuoyCondition and BuoyLocation are embedded documents. The mgo driver makes it easy to use native types that represent the documents stored in our collections by using tags. With the tags, we can control how the mgo driver unmarshals the returned documents into our native Go structures.

Now let’s look at how we connect to a MongoDB database using mgo:

We start with creating a mgo.DialInfo object. We can connect to a single MongoDB database or a replica set. Connecting to a replica set can be accomplished by providing multiple addresses in the Addrs field or with a single address. If we are using a single host address to connect to a replice set, the mgo driver will learn about any remaining hosts from the replica set member we connect to. In our case we are connecting to a single host.

After providing the host, we specify the database, username and password we need for authentication. One thing to note is that the database we authenticate against may not necessarily be the database our application needs to access. Some applications authenticate against the admin database and then use other databases depending on their configuration. The mgo driver supports these types of configurations very well.

Next we use the mgo.DialWithInfo method to create a mgo.Session object. The mgo.Session object maintains a pool of connections to the MongoDB host. We can create multiple sessions with different modes and settings to support different aspects of our applications. We can specify if the session is to use a Strong or Monotonic mode, and we can set the safe level as well as other settings.

The next line of code sets the mode for the session. There are three modes that can be set, Strong, Monotonic and Eventual. Each mode sets a specific consistency for how reads and writes are performed. For more information on the differences between each mode, check out the documentation for the mgo driver.

We are using Monotonic mode which provides reads that may not entirely be up to date, but the reads will always see the history of changes moving forward. In this mode reads occur against secondary members of our replica sets until a write happens. Once a write happens, the primary member is used. The benefit is some distribution of the reading load can take place against the secondaries when possible.


With the session set and ready to go, next we execute multiple queries concurrently:

This code is classic Go concurrency in action. First we create a sync.WaitGroup object so we can keep track of all the goroutines we are going to launch as they complete their work. Then we immediately set the count of the sync.WaitGroup object to ten and use a for loop to launch ten goroutines using the RunQuery function. The keyword go is used to launch a function or method to run concurrently. The final line of code calls the Wait method on the sync.WaitGroup object which locks the main goroutine until everything is done processing.

To learn more about Go concurrency and better understand how this particular piece of code works, check out these posts on concurrency and channels.

Now let’s look at the RunQuery function and see how to properly use the mgo.Session object to acquire a connection and execute a query:

The very first thing we do inside of the RunQuery function is to defer the execution of the Done method on the sync.WaitGroup object. The defer keyword will postpone the execution of the Done method, to take place once the RunQuery function returns. This will guarantee that the sync.WaitGroup objects count will decrement even if an unhandled exception occurs.

Next we make a copy of the session we created in the main goroutine. Each goroutine needs to create a copy of the session so they each obtain their own socket without serializing their calls with the other goroutines. Again, we use the defer keyword to postpone and guarantee the execution of the Close method on the session once the RunQuery function returns. Closing the session returns the socket back to the main pool, so this is very important.

To execute a query we need a mgo.Collection object. We can get a mgo.Collection object through the mgo.Session object by specifying the name of the database and then the collection. Using the mgo.Collection object, we can perform a Find and retrieve all the documents from the collection. The All function will unmarshal the response into our slice of BuoyStation objects. A slice is a dynamic array in Go. Be aware that the All method will load all the data in memory at once. For large collections it is better to use the Iter method instead. Finally, we just log the number of BuoyStation objects that are returned.

Conclusion
The example shows how to use Go concurrency to launch multiple goroutines that can execute queries against a MongoDB database independently. Once a session is established, the mgo driver exposes all of the MongoDB functionality and handles the unmarshaling of BSON documents into Go native types.

MongoDB can handle a large number of concurrent requests when you architect your MongoDB databases and collections with concurrency in mind. Go and the mgo driver are perfectly aligned to push MongoDB to its limits and build software that can take advantage of all the computing power that is available.

The mgo driver can help you distribute your queries across a MongoDB replica set. The mgo driver gives you the ability to create and configure your sessions and take advantage of MongoDB’s mode and configuration options. The mode you use for your session, how and where the cluster and load balancer is setup, and the type of work being processed by MongoDB at the time of those queries, plays an important role in the actual distribution.

The mgo driver provides a safe way to leverage Go’s concurrency support and you have the flexibility to execute queries concurrently and in parallel. It is best to take the time to learn a bit about MongoDB replica sets and load balancer configuration. Then make sure the load balancer is behaving as expected under the different types of load your application can produce.

Now is a great time to see what MongoDB and Go can do for your software applications, web services and service platforms. Both technologies are being battle tested everyday by all types of companies, solving all types of business and computing problems.

<!– package main

import ( "labix.org/v2/mgo" "labix.org/v2/mgo/bson" "log" "sync" "time" )

const ( MongoDBHosts = "ds035428.mongolab.com:35428" AuthDatabase = "goinggo" AuthUserName = "guest" AuthPassword = "welcome" TestDatabase = "goinggo" )

type ( // BuoyCondition contains information for an individual station BuoyCondition struct { WindSpeed float64 bson:&quot;wind_speed_milehour&quot; WindDirection int bson:&quot;wind_direction_degnorth&quot; WindGust float64 bson:&quot;gust_wind_speed_milehour&quot; }

// BuoyLocation contains the buoy’s location BuoyLocation struct { Type string bson:&quot;type&quot; Coordinates []float64 bson:&quot;coordinates&quot; }

// BuoyStation contains information for an individual station BuoyStation struct { ID bson.ObjectId bson:&quot;_id,omitempty&quot; StationId string bson:&quot;station_id&quot; Name string bson:&quot;name&quot; LocDesc string bson:&quot;location_desc&quot; Condition BuoyCondition bson:&quot;condition&quot; Location BuoyLocation bson:&quot;location&quot; } )

func main() { // We need this object to establish a session to our MongoDB mongoDBDialInfo := &mgo.DialInfo{ Addrs: []string{MongoDBHosts}, Timeout: 60 * time.Second, Database: AuthDatabase, Username: AuthUserName, Password: AuthPassword, }

// Create a session which maintains a pool of socket connections // to our MongoDB mongoSession, err := mgo.DialWithInfo(mongoDBDialInfo) if err != nil { log.Fatalf("CreateSession: %s\n", err) }

// Reads may not be entirely up-to-date, but they will always see the // history of changes moving forward, the data read will be consistent // across sequential queries in the same session, and modifications made // within the session will be observed in following queries (read-your-writes). // http://godoc.org/labix.org/v2/mgo#Session.SetMode mongoSession.SetMode(mgo.Monotonic, true)

// Create a wait group to manage the goroutines var waitGroup sync.WaitGroup

// Perform 10 concurrent queries against the database waitGroup.Add(10) for query := 0; query < 10; query++ { go RunQuery(query, &waitGroup, mongoSession) }

// Wait for all the queries to complete waitGroup.Wait() log.Println("All Queries Completed") }

func RunQuery(query int, waitGroup *sync.WaitGroup, mongoSession *mgo.Session) { // Decrement the wait group count so the program knows this // has been completed once the goroutine exits defer waitGroup.Done()

// Request a socket connection from the session to process our query. // Close the session when the goroutine exits and put the connection back // into the pool sessionCopy := mongoSession.Copy() defer sessionCopy.Close()

// Get a collection to execute the query against collection := sessionCopy.DB(TestDatabase).C("buoy_stations")

log.Printf("RunQuery : %d : Executing\n", query)

// Retrieve the list of stations var buoyStations []BuoyStation err := collection.Find(nil).All(&buoyStations) if err != nil { log.Printf("RunQuery : ERROR : %s\n", err) return }

log.Printf("RunQuery : %d : Count[%d]\n", query, len(buoyStations)) } –>


Go Training

We have taught Go to thousands of developers all around the world since 2014. There is no other company that has been doing it longer and our material has proven to help jump start developers 6 to 12 months ahead of their knowledge of Go. We know what knowledge developers need in order to be productive and efficient when writing software in Go.

Our Go, Web and Data Science classes are perfect for both experienced and beginning engineers. We start every class from the beginning and get very detailed about the internals, mechanics, specification, guidelines, best practices and design philosophies. We cover a lot about "if performance matters" with a focus on mechanical sympathy, data oriented design, decoupling and writing production software.

Learn More

To learn about Corporate training events, options and special pricing please contact:

William Kennedy
ArdanLabs (www.ardanlabs.com)
bill@ardanlabs.com