Monday, September 23, 2013

Iterating Over Slices In Go

Slices are used everywhere in my code. If I am working with data from MongoDB, it is stored in a slice. If I need to keep track of a collection of problems after running an operation, it is stored in a slice. If you don't understand how slices work yet or have been avoiding them like I did when I started, read these two posts to learn more.

http://www.goinggo.net/2013/08/understanding-slices-in-go-programming.html
http://www.goinggo.net/2013/08/collections-of-unknown-length-in-go.html

A question that I am constantly asking myself when coding is, “Do I want to use a pointer to this object or do I want to make a copy?” Though Go can be used as a functional programming language, it is an imperative programming language at heart. What's the difference?

A functional programming language does not allow you to change the state of a variable or an object once it has been created and initialized. This means variables and objects are immutable, they can't be changed. If you want to change the state of a variable or an object, you must make a copy and initialize the copy with the changes. Functions are always passed copies and return values are always copies too.

In an imperative programming language we can create variables and objects that are mutable, or can be changed. We can pass a pointer for any variable or object to a function, which in turn can change the state as necessary. A functional programming language wants you to think in terms of mathematical functions that take input and produce a result. In an imperative programming language we can build similar functions, but we can also build functions that perform operations on state that can exist anywhere in memory.

Being able to use a pointer has advantages but can also get you in trouble. Using pointers can alleviate memory constraints and possibly improve performance. It can also create synchronization issues such as shared access to objects and resources. Find the solution that works best for each individual use case. For your Go programs I recommend using pointers when it is safe and practical. Go is an imperative programming language so take advantage of that.

In Go everything is pass by value and it is really important to remember that. We can pass by value the address of an object or pass by value a copy of an object. When we use a pointer in Go it can sometime be confusing because Go handles all the dereferencing for us. Don’t get me wrong, its great that Go does this, but sometime you can forget what the value of your variable actually is.

At some point in every program, I have the need to iterate over a slice to perform some work. In Go we use the keyword range within a for loop construct to iterate over a slice. In the beginning I made some very bad mistakes iterating over slices because I misunderstood how the range keyword worked. I will show you a nasty bug I created iterating over a slice that puzzled me for a bit. Now it is obvious to me why the code misbehaved, but at the time I was shaking my head.

Let's create some simple objects and place them inside of a slice. Then we will iterate over the slice and see what happens.

package main

import (
    "fmt"
)

type Dog struct {
    Name string
    Age int
}

func main() {
    jackie := Dog{
        Name: "Jackie",
        Age: 19,
    }

    fmt.Printf("Jackie Addr: %p\n", &jackie)

    sammy := Dog{
        Name: "Sammy",
        Age: 10,
    }

    fmt.Printf("Sammy Addr: %p\n", &sammy)

    dogs := []Dog{jackie, sammy}

    fmt.Println("")

    for _, dog := range dogs {
        fmt.Printf("Name: %s Age: %d\n", dog.Name, dog.Age)
        fmt.Printf("Addr: %p\n", &dog)

        fmt.Println("")
    }
}

The program creates two dog objects and puts them into a slice of dogs. We display the address of each dog object. Then we iterate over the slice displaying the name, age and address of each Dog. Here is the output for the program:

Jackie Addr: 0x2101bc000
Sammy Addr: 0x2101bc040

Name: Jackie Age: 19
Addr: 0x2101bc060

Name: Sammy Age: 10
Addr: 0x2101bc060

So why is the address of the dog object different inside the range loop and why does the same address appear twice? This all has to do with the fact that everything is pass by value in Go. In this code example we actually create 2 extra copies of each Dog object in memory.


The initial existence of each Dog object is created with a composite literal:

jackie := Dog{
    Name: "Jackie",
    Age: 19,
}

The first copies of the objects are created when the objects are placed into the slice: 

dogs := []Dog{jackie, sammy}

The second copies of the objects are created when we iterate over the slice:

dog := range dogs

Now we can see why the address of the dog variable inside range loop is always the same. We are displaying the address of the dog variable, which happens to be a local variable of type Dog and contains a copy of the Dog object for each index of the slice. With each iteration of the slice, the location of the dog variable is the same. The value of the dog variable is changing.

That nasty bug I was talking about earlier had to do with me thinking the address of the dog variable could be used as a pointer to each individual Dog object inside the slice. Something like this:

allDogs := []*Dog{}

for _, dog := range dogs {
    allDogs = append(allDogs, &dog)
}

for _, dog := range allDogs {
    fmt.Printf("Name: %s Age: %d\n", dog.Name, dog.Age)
}

I create a new slice that can hold pointers to Dog objects. Then I range over the slice of dogs storing the address of each Dog object into the new slice. Or at least I think I am storing the address of each Dog object.

If I add this code to the program and run it, this is the output:

Name: Sammy Age: 10
Name: Sammy Age: 10

I end up with a slice where every element has the same address. This address is pointing to a copy of the last object that we iterated over. Yikes!!

If making all these copies is not what you want, you could use pointers. Here is the example program using pointers:

package main

import (
    "fmt"
)

type Dog struct {
    Name string
    Age int
}

func main() {
    jackie := &Dog{
        Name: "Jackie",
        Age: 19,
    }

    fmt.Printf("Jackie Addr: %p\n", jackie)

    sammy := &Dog{
        Name: "Sammy",
        Age: 10,
    }

    fmt.Printf("Sammy Addr: %p\n\n", sammy)

    dogs := []*Dog{jackie, sammy}

    for _, dog := range dogs {
        fmt.Printf("Name: %s Age: %d\n", dog.Name, dog.Age)
        fmt.Printf("Addr: %p\n\n", dog)
    }
}

Here is the output:

Jackie Addr: 0x2101bb000
Sammy Addr: 0x2101bb040

Name: Jackie Age: 19
Addr: 0x2101bb000

Name: Sammy Age: 10
Addr: 0x2101bb040

This time we create a slice of pointers to Dog objects. When we iterate over this slice, the value of the dog variable is the address of each Dog object we stored in the slice. Instead of creating two extra copies of each Dog object, we are using the same initial Dog object we created with the composite literal.

When the slice is a collection of Dog objects or a collection of pointers to Dog objects, the range loop is the same.

for _, dog := range dogs {
    fmt.Printf("Name: %s Age: %d\n", dog.Name, dog.Age)
}

Go handles access to the Dog object regardless of whether we are using a pointer or not. This is awesome but can sometimes lead to a bit of confusion. At least it was for me in the beginning.

I can't tell you when you should use a pointer or when you should use a copy. Just remember that Go is going to pass everything by value. That includes function parameters, return values and when iterating over a slice, map or channel.

Yes, you can also range over a channel. Take a look at this sample code I altered from a blog post written by Ewen Cheslack-Postava:

http://ewencp.org/blog/golang-iterators/

package main

import (
    "fmt"
)

type Dog struct {
    Name string
    Age int
}

type DogCollection struct {
    Data []*Dog
}

func (this *DogCollection) Init() {
    cloey := &Dog{"Cloey", 1}
    ralph := &Dog{"Ralph", 5}
    jackie := &Dog{"Jackie", 10}
    bella := &Dog{"Bella", 2}
    jamie := &Dog{"Jamie", 6}

    this.Data = []*Dog{cloey, ralph, jackie, bella, jamie}
}

func (this *DogCollection) CollectionChannel() chan *Dog {
    dataChannel := make(chan *Dog, len(this.Data))

    for _, dog := range this.Data {
        dataChannel <- dog
    }

    close(dataChannel)

    return dataChannel
}

func main() {
    dc := DogCollection{}
    dc.Init()

    for dog := range dc.CollectionChannel() {
        fmt.Printf("Channel Name: %s\n", dog.Name)
    }
}

If you run the program you will get the following output:

Channel Name: Cloey
Channel Name: Ralph
Channel Name: Jackie
Channel Name: Bella
Channel Name: Jamie

I really love this sample code because it shows the beauty of a closed channel. The key to making this program work is the fact that a closed channel is always in the signaled state. That means any read on the channel will return immediately. If the channel is empty a default value is returned. This is what allows the range to iterate over all the data that was passed into the channel and complete when the channel is empty. Once the channel is empty, the next read on the channel will return nil. This causes the loop to terminate.

Slices are great, lightweight and powerful. You should be using them and gaining the benefits they provide. Just remember that when you are iterating over a slice, you are getting a copy of each element of the slice. If that happens to be an object, you are getting a copy of that object. Don’t ever use the address of the local variable in the range loop. That is a local variable that contains a copy of the slice element and only has local context. Don’t make the same mistake I made.

9 comments:

  1. Thanks - pretty sure I would have fallen into this eventually. Just a matter of time.

    ReplyDelete
  2. another great article william! i pondered making extensive use of pointers as a general practice in Go but i think in the end the best advice came from dmitry vyukov, who offered the straightforward advice to not use pointers unless you have to...namely, when you want to mutate the referenced data.

    ReplyDelete
  3. Coming from C-land... my first reaction was "that's not safe!"; if dc fell out of scope after the collection was put together your pointers may become invalid. But I tried to test this by clobbering dc.Data...

    func main() {
    dc := DogCollection{}
    dc.Init()
    dcc := dc.CollectionChannel()
    dc.Data = []*Dog{&Dog{"bill", 99}}
    for dog := range dcc {
    fmt.Printf("Channel Name: %s\n", dog.Name)
    }
    }

    ... to no avail. Does it usually work this way? Could this result from refcounted pointers or a lazy garbage collector?

    ReplyDelete
    Replies
    1. Hi John, I have not studied the garbage collector in any detail. My very newbie understanding is that Go identifies when objects need to be managed by the garbage collector. So you can't look at this code like you would a C program. I don't fully understand you sample. Put it in the Go Playground and we can get some answers.

      Delete
  4. Thank you so much for this article! I think I finally understand why my latest Go program runs almost as slow as the Ruby version, and five times slower then the Elixir version. You also made me think that maybe I really don't want to write Go code. For a language not optimized for pass-by-value, like functional languages are, it sure seems strange that pass by value is the default behavior.

    ReplyDelete
    Replies
    1. It is not my goal to make people leave the language so that makes me sad :( Why not put some of your code up on the playground or in github so we can see it. Maybe there is something you are doing that can be improved on.

      Delete
  5. I think there is a bit of a problem with this simple chan use approach. It looks nice. But if loop that iterates over performs break and leaves iteration loop then your coroutine with channel leaks (does not close and keeps open channel that ways for data to be read on the other end).

    for dog := range dc.CollectionChannel() {

    fmt.Printf("Channel Name: %s\n", dog.Name)
    if _something_ {
    break // <-- let say this triggers sometimes.
    }
    }

    The approach and solution is a bit more complex if you want to be able to break out of iteration loops.

    ReplyDelete
    Replies
    1. Now since I had a second look maybe it is ok. Since you close the channel when you exit CollectionChannel() func.

      Delete
  6. Good article! I wondered whether Go copied the entire slice BEFORE iterating or if Go copied each item of the slice AS it iterated. It turns out it was the former.

    http://play.golang.org/p/b39rSgAMWn

    ReplyDelete