Debugging your Go App with AWS X-Ray

So you received an alert that your application’s performance is degraded or that one of the calls in your application is failing … what is your first reaction? In the absence of any obvious cause, your first instinct might be to take a look at the logs, since everything breaks for a reason. Provided you are logging at critical logic levels, the application logs will typically help you to quickly understand a local problem.

But, what if your application is part of a distributed or micro-services architecture? This log-centric approach becomes a challenge when you need to inspect several different logs with different formats. Debugging a system-level issue within this architecture can be much more difficult.

As we have built more and more of our services on top of AWS, resiliency and recovery are important considerations. At Enova we strive to deliver beyond customer expectations — minimizing the time from issue discovery to diagnosis is essential. To achieve this, we need a way to correlate the events across different services quickly and accurately. That is where AWS X-Ray comes to the rescue.

What is AWS X-Ray and how does it work?

AWS X-Ray is a debugging and tracing tool that collates and correlates the events in different services of a given distributed or micro-services structure and provides you with a connected graph of all the events. It collects data about the requests your application serves, injects a globally unique 96-bit TraceID and sends it to a centralized API endpoint where all traces emitted by your application are collated to form a connected graph. For a Go application to start capturing data, X-Ray SDK  needs to be added to the Go code, which captures the custom segments in the app and sends it to a daemon service. The daemon service is just another light-weight application that runs in the background listening for traffic on UDP port 2000. It works along with the SDK to gather raw segment data, and then relay it to the AWS X-Ray API.

Let’s look at how the SDK creates a TraceID and how it is propagated in a user request.

Trace ID Propagation

The above diagram shows how an ID and a Parent ID are generated by the SDK before starting any segment. The TraceID is a combination of the ID, the parent ID, and information on the sampling of the request. The parent IDs help in tracing the request back to the originator. Note that SDK takes care of creating a TraceID (if not present already) and propagating it to all the downstream calls via HTTP headers (X-Amzn-Trace-Id) or via metadata or attributes for supported AWS services (e:g S3, DynamoDB, etc.).

What does AWS X-Ray offer?

  • It gives an excellent performance view of the connected AWS services and HTTP/Database calls your application makes. It helps visualize the services in a distributed system and gives a user-centric view of the requests.
  • It provides the ability to quickly find and address performance concerns.
  • It provides insight into the faults(Http 5XX) and errors(Http 4XX).

Err

  • It provides the ability to annotate your subsegment with custom key-value pairs, in case you ever wanted to slice and dice the calls based on some identifiers.
    • segment.AddAnnotation(“clientID”, “Enova”)
  • AWS X-Ray Analytics provides the ability to dynamically analyze application performance and error rates by comparing trace responses and trends across a holistic, filterable listing of attributes. For example, you can compare traces corresponding to separate response time peaks, occurring at different time intervals or compare the application behavior of a single user to the rest of the users.
  • It provides a service map view of all the connected services. The traces give a good view on average latency and response distribution.Sample Trace
  • The traces offer deep insight on time taken on each operation which helps you make an informed decision on the performance profile of your application.AWS X-Ray is instrumental in diagnosing issues when the application load is highly dynamic, meaning the request count scales up or down frequently since this changes the performance profile of the application. At Enova, we use different tools like New Relic along with AWS X-Ray to monitor performance which helps us keep the confidence in our apps high.

How to instrument AWS X-Ray in an application

An example snippet which shows how to start a custom segment(named as service-A) in Go:

awsContext, segment := xray.BeginSegment(ctx, "service-A")

The awsContext contains information about the TraceID which must be passed on to other context-aware calls. The snippet below shows how to start capturing a critical call in a segment:

xray.Capture(awsContext, "service-A.capture",
     func(ctx1 context.Context) error {
         awsContext, subSegment := xray.BeginSubsegment(awsContext, "service-A.critical-call-1")
         // App Logic }) 
subSegment.Close(nil)

The capture blocks collect the information about the request and send it to the daemon, which sends those segments to X-Ray API at regular intervals, mostly when the buffer is full. It is always a good idea to close the segment and subsegment at the end of the block execution, to avoid any overflows and to capture the end_time for the segment.

At Enova, we have instrumented X-Ray in containers with auto-scaling capabilities and AWS Lambdas to observe production traffic.

  • ECS: If you are running a Docker container on ECS, you need to run a daemon container with Amazon’s Docker image- amazon/aws-xray-daemon along with the app container and link the two. The only thing you need to do once X-Ray daemon is setup is to add custom capture segments code in the application. The daemon service will run as long as AWS credentials are available or the ECS task execution role has permissions to send segments to X-Ray API endpoint.
  • AWS Lambdas: Lambda service comes prepackaged with X-Ray and you do not need to run a daemon explicitly to gather data. Just a configuration in the Lambda service would let you enable X-Ray tracing. This also means that a segment will be created for a request served by the Lambda service.

The SDK also starts a new segment for any outgoing calls originating from the application provided the clients are wrapped with X-Ray. For example:

Wrapping a DynamoDB client:

DynamoDB = dynamodb.New(sess)
xray.AWS(DynamoDB.Client)

Wrapping an S3 client:

svc := s3.New(sess)
xray.AWS(svc.Client)

Wrapping an HTTP client:

client := xray.Client(&http.Client{})
response, err := ctxhttp.Do(ctx, client, request)

FAQs on X-Ray: https://aws.amazon.com/xray/faqs/

Conclusion

AWS X-Ray has been very insightful and informative to our work for Enova Decisions. It helps our team quickly make key performance-related decisions as we add capabilities to the newly launched Enova Decisions Cloud, which is our decision management platform-as-a-service. For more information check out https://www.enovadecisions.com.