Using XSLT With Go

Nov 3, 2013


I am working on a project that requires pulling and processing different XML feeds from the web and storing the data into MongoDB as JSON. Since new feeds come up everyday, changing the Go program to process and publish new feeds is out of the question. A second constraint is that processing has to work in Iron.io or any other linux cloud based environment.

What I needed was a Go program that could take an XML document and XSLT stylesheet at runtime, transform the XML into JSON and then store the JSON to MongoDB. I have some specific field names and other requirements for the JSON document that I need to make sure exist. XSLT makes this real easy to support.

At first I looked at the different C libraries that exist. I figured I could integrate a library using CGO but after a few hours I realized this was not going to work. The libraries I found were huge and complex. Then by chance I found a reference about a program called xsltproc. The program exists both for the Mac and Linux operating systems. In fact, it comes pre-installed on the Mac and an apt-get will get you a copy of the program on your linux operating system.

I have built a sample program that shows how to use xsltproc in your Go programs. Before we download the sample code we need to make sure you have xsltproc installed.

If you are running on a Mac, xsltproc should already exist under /usr/bin

which xsltproc

/usr/bin/xsltproc

On your linux operating system just run apt-get if you don’t already have xsltproc installed

sudo apt-get install xsltproc

The xsltproc program will be installed in the same place under /usr/bin. To make sure everything is good, run the xsltproc program requesting the version:

xsltproc –version

xsltproc was compiled against libxml 20708, libxslt 10126 and libexslt 815
libxslt 10126 was compiled against libxml 20708
libexslt 815 was compiled against libxml 20708

To download and try the sample program, open a terminal session and run the following commands:

export GOPATH=$HOME/example

go get github.com/goinggo/xslt
cd $GOPATH/src/github.com/goinggo/xslt
go build

If you want to install the code under your normal GOPATH, start with the ‘go get’ line. Here are the files that should exist after the build:

main.go            – Source code for test program
deals.xml          – Sample XML document from Yipit
stylesheet.xslt    – Stylesheet to transform the Yipit XML feed to JSON
xslt               – Test program

Let’s look at a portion of the XML document the sample program will transform:

<deals>
  <list-item>
    <yipit_url>http://yipit.com/business/rondeaus-kickboxing/</yipit_url>
    <end_date>2014-01-2716:00:03</end_date>
    <title>Let a Former Pro Teach You a Few Kicks of the Trade Month…</title>
    <tags>
        <list-item>
            <url />
            <name>Fitness Classes</name>
            <slug>fitness-classes</slug>
        </list-item>
    </tags>
    …
  </list-item>
</deals>

The XML can be found in the deals.xml file. It is an extensive XML document and too large to show in its entirety.

Let’s look at a portion of the XSLT stylesheet:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:str="http://exslt.org/strings"
    version="1.0"
    extension-element-prefixes="str">
    <xsl:output method="text" />
    <xsl:template name="cleanText">
        <xsl:param name="pText" />
        <xsl:variable name="cleaned1" select="str:replace($pText, ‘&quot;‘, “)" />
        <xsl:variable name="cleaned2" select="str:replace($cleaned1, ‘\‘, “)" />
        <xsl:variable name="cleaned3" select="str:replace($cleaned2, ‘&#xA;‘, “)" />
        <xsl:value-of select="$cleaned3" />
    </xsl:template>
    …
    <xsl:template match="/">{"deals": [
    <xsl:for-each select="root/response/deals/list-item">{
        "dealid": <xsl:value-of select="id" />,
        "feed": "Yipit",
        "date_added": "<xsl:value-of select="date_added" />",
        "end_date": "<xsl:value-of select="end_date" />",
        …
        "categories": [<xsl:for-each select="tags/list-item">"<xsl:value-of select="slug"/>"<xsl:choose><xsl:when test="position() != last()">,</xsl:when></xsl:choose></xsl:for-each>],
        …
    }<xsl:choose><xsl:when test="position() != last()">,
    </xsl:when></xsl:choose>
</xsl:for-each>
]}
    </xsl:template>
</xsl:stylesheet>

This XSLT can be found in the stylesheet.xslt file. It is an extensive XSLT stylesheet with templates to help cleanup the XML data. Something really great about xsltproc is that it already contains a bunch of great extensions:

./xsltproc_darwin -dumpextensions

Registered XSLT Extensions
--------------------------
Registered Extension Functions:
{http://exslt.org/math}lowest
{http://exslt.org/math}power
{http://exslt.org/strings}concat
{http://exslt.org/dates-and-times}date
{http://exslt.org/dates-and-times}day-name
{http://exslt.org/common}object-type
{http://exslt.org/math}atan
{http://exslt.org/strings}encode-uri
{http://exslt.org/strings}decode-uri
{http://exslt.org/dates-and-times}add-duration
{http://exslt.org/dates-and-times}difference
{http://exslt.org/dates-and-times}leap-year
{http://exslt.org/dates-and-times}month-abbreviation
{http://exslt.org/dynamic}map
{http://exslt.org/math}tan
{http://exslt.org/math}exp
{http://exslt.org/dates-and-times}date-time
{http://exslt.org/dates-and-times}day-in-week
{http://exslt.org/dates-and-times}second-in-minute
{http://exslt.org/dates-and-times}year
{http://icl.com/saxon}evaluate
{http://exslt.org/math}log
{http://exslt.org/dates-and-times}add
{http://exslt.org/dates-and-times}day-abbreviation
{http://icl.com/saxon}line-number
{http://exslt.org/math}constant
{http://exslt.org/sets}difference
{http://exslt.org/dates-and-times}duration
{http://exslt.org/dates-and-times}minute-in-hour
{http://icl.com/saxon}eval
{http://exslt.org/math}min
{http://exslt.org/math}max
{http://exslt.org/math}highest
{http://exslt.org/math}random
{http://exslt.org/math}sqrt
{http://exslt.org/math}cos
{http://exslt.org/sets}has-same-node
{http://exslt.org/strings}tokenize
{http://exslt.org/dates-and-times}seconds
{http://exslt.org/dates-and-times}time
{http://exslt.org/dynamic}evaluate
{http://exslt.org/common}node-set
{http://exslt.org/dates-and-times}month-name
{http://exslt.org/dates-and-times}week-in-year
{http://exslt.org/math}acos
{http://exslt.org/sets}intersection
{http://exslt.org/sets}leading
{http://exslt.org/sets}trailing
{http://exslt.org/strings}replace
{http://exslt.org/dates-and-times}day-in-year
{http://icl.com/saxon}expression
{http://exslt.org/math}abs
{http://exslt.org/math}sin
{http://exslt.org/math}asin
{http://exslt.org/math}atan2
{http://exslt.org/sets}distinct
{http://exslt.org/dates-and-times}hour-in-day
{http://exslt.org/dates-and-times}sum
{http://exslt.org/dates-and-times}week-in-month
{http://exslt.org/strings}split
{http://exslt.org/strings}padding
{http://exslt.org/strings}align
{http://exslt.org/dates-and-times}day-in-month
{http://exslt.org/dates-and-times}day-of-week-in-month
{http://exslt.org/dates-and-times}month-in-year
{http://xmlsoft.org/XSLT/}test

Registered Extension Elements:
{http://exslt.org/common}document
{http://exslt.org/functions}result
{http://xmlsoft.org/XSLT/}test

Registered Extension Modules:
http://exslt.org/functions
http://icl.com/saxon
http://xmlsoft.org/XSLT/

Look at the stylesheet to see how to access these extensions. I am using the strings extension to help replace characters that are not JSON compliant.

Now let’s look at the sample code that uses xsltproc to process the XML against the XSLT stylesheet:

package main

import (
    "encoding/json"
    "fmt"
    "os"
    "os/exec"
)

type document map[string]interface{}

func main() {
    jsonData, err := processXslt("stylesheet.xslt", "deals.xml")
    if err != nil {
        fmt.Printf("ProcessXslt: %s\n", err)
        os.Exit(1)
    }

    documents := struct {
        Deals []document json:&quot;deals&quot;
    }{}

    err = json.Unmarshal(jsonData, &documents)
    if err != nil {
        fmt.Printf("Unmarshal: %s\n", err)
        os.Exit(1)
    }

    fmt.Printf("Deals: %d\n\n", len(documents.Deals))

    for _, deal := range documents.Deals {
        fmt.Printf("DealId: %d\n", int(deal["dealid"].(float64)))
        fmt.Printf("Title: %s\n\n", deal["title"].(string))
    }
}

func processXslt(xslFile string, xmlFile string) (jsonData []byte, err error) {
    cmd := exec.Cmd{
        Args: []string{"xsltproc", xslFile, xmlFile},
        Env: os.Environ(),
        Path: "xsltproc",
    }

    jsonString, err := cmd.Output()
    if err != nil {
        return jsonData, err
    }

    fmt.Printf("%s\n", jsonString)

    jsonData = []byte(jsonString)

    return jsonData, err
}

The processXslt function uses an exec.Cmd object to shell out and run the xsltproc program. The key to making this work is the cmd.Output function. The xsltproc program will return the result of the transformation to stdout. This means we only need to write the xml and xslt files to disk before running xsltproc. We will receive the result from xsltproc as a string from the cmd.Output call.

Once the processXslt function has the resulting JSON transformation from xsltproc, the JSON is displayed on the screen and then converted to a slice of bytes for further processing.

In main after the call to the processXslt function, the slice of bytes containing the JSON transformation is unmarshalled into a map so it can be consumed by our Go program and displayed on the screen. In the future that map can be stored in MongoDB via the mgo MongoDB driver.

The xsltproc program can be uploaded to any cloud environment that will allow you to write the XML and XSLT to disk. I have been successful in using xsltproc inside an Iron.io IronWorker container.

If you have the need to process XSLT in your Go programs, give this a try.


Ultimate Go Programming LiveLessons

Ultimate Go Programming LiveLessons provides an intensive, comprehensive, and idiomatic view of the Go programming language. This course focuses on both the specification and implementation of the language, including topics ranging from language syntax, design, and guidelines to concurrency, testing, and profiling. This class is perfect for anyone who wants a jump-start in learning Go or wants a more thorough understanding of the language and its internals.

Learn more

Go Training

We have taught Go to thousands of developers all around the world since 2014. There is no other company that has been doing it longer and our material has proven to help jump start developers 6 to 12 months ahead of their knowledge of Go. We know what knowledge developers need in order to be productive and efficient when writing software in Go.

Our Go, Web and Data Science classes are perfect for both experienced and beginning engineers. We start every class from the beginning and get very detailed about the internals, mechanics, specification, guidelines, best practices and design philosophies. We cover a lot about "if performance matters" with a focus on mechanical sympathy, data oriented design, decoupling and writing production software.

Learn More

To learn about Corporate training events, options and special pricing please contact:

William Kennedy
ArdanLabs (www.ardanlabs.com)
bill@ardanlabs.com