Well the Telluride Film Festival is winding down today (Monday 09/02/2013) and now it is time to mine the compute usage analytics that we have collected.
This is the second year that FatFractal has hosted the Telluride back-end and things went very smoothly, due in large part to planning based on last year’s fuzzy compute usage analytics. Last year we had no idea what to expect in terms of load and sat on the edges of our collective devops seats as we watched the traffic increase each day. The Telluride Film Festival is a five day event that builds up momentum across the week as film enthusiasts arrive to the festival. FatFractal (specifically Dave Wells) worked in conjunction with Pete Nies to develop clients for iOS, Android, and the browser that provide functionality that truly helps the film goer optimize their Telluride experience.
Some example functionality is:
- Seat availability.
- Book signing schedule.
- Film schedule.
- Guest directors.
Below are some iOS and Android screenshots:
The Telluride back-end data must be updated periodically across the five days on live production systems and that load must be factored into the planning. The data (like most data models) resides in multiple collections and consists of both objects (JSON) and blobs that are related in some manner which is facilitated through really cool FatFractal NoServer features. In addition the Telluride back-end easily integrated with Salesforce (for seating availability) using the NoServer Server Extensions
Last year we served the Telluride back-end up off a heavily multi-tenanted EC2 m1.xlarge instance and it did the job. This year we served the Telluride back-end off of two heavily multi-tenanted EC2 m1.large instances for redundancy purposes and the traffic was load balanced to the instances by our directors. We have far more apps on the platform now than we did this time last year. So we figured given the normal loads on the two EC2 m1.large instances and last year’s Telluride loads (wish we had real analytics back then) we should be able to accommodate this years load with some head room (fingers crossed).
Below are some screenshots of the instance loads. The two instances are represented by the green and blue lines. The Telluride film festival started 08/29/2013 and ended 09/02/2013.
It should be noted that at the start of the festival (see the spikes) we uncovered a bug that affected CPU utilization (that ever elusive monster query) that was fixed in about an hour by our resident guru, Gary Casey.
As you can see the instances easily handled the load and our assumptions based on last year’s fuzzy compute usage analytics were somewhat validated. Unfortunately last year we were not collecting application-level metrics and relied heavily on information from our logs and extrapolated what we could. Given the graphs above we could have squeezed more out of the instances but without fine-grained analytics we did not want to take the risk.
What application compute usage analytics allow us to do is determine what percentage of an instance’s load is being consumed by each respective application on that instance. So if I wanted to scale the application to another instance, I would know approximately how much compute must be available on that instance or take the simplistic route and spin up the appropropriate EC2 instance type.
In the table below you can see two applications, telluride and an unnamed app we’ll call ‘anon’, they both reside on the same instance. The metrics have been collected across the dates 08/29/2013-09/02/2013. So I have their relative compute usage and can determine how much they contribute to the total instance load.
I should note that the telluride aggregates have actually been collected across two instances but the analytic records contain the instance id so that I can aggregate across one or more instances, the table is just an example of what application analytics get collected.
Below is a screenshot of the Telluride API calls and response times from 08/29/2013-09/02/2013 across both EC2 m1.large instances. Application analytics provides a fine grained view port that can be drilled down on to determine precisely what the compute usage of any application is.
Application analytics is critical to any multi-tenanted BaaS or PaaS environment. It is the information necessary to accurately profile an application’s compute usage such that it can be properly scaled in a predictive fashion. In addition application analytics is the tool by which the infrastructure can be utilized in the most efficient manner possible allowing for optimal multi-tenancy and ultimately a lower cost to the developer and enterprise.
We will be far more informed for next year’s Telluride Film Festival with the application analytics we captured this year and really look forward to next year’s event!