X
X

Login

Login if you already have an account

LOGIN

Register

Create a new account. join us!

REGISTER

Support

Need some help? Check out our forum.

FORUM

TechEmpower and Servlets

This blog article is Part One of a two part set of articles that provides an overview of our Servlet module and its performance as compared to a conventional Servlet container, Tomcat. This article will focus on the design differences between the containers and then provide an overview of the TechEmpower test suite which will be used to compare the performance. The TechEmpower folks have done a bang up job of benchmarking different frameworks and in the process have created a standard test suite that is comprehensive and exercises functionality that you would typically find in a production environment. Rather than reinvent the wheel I have decided to leverage the TechEmpower test suite with a couple minor changes. Part Two (which will be available in a couple weeks) of this set of articles will focus on the results and hopefully :-) provides some insight that explains them.

FatFractal (FF) Application Container aka Engine

Before jumping directly into the FF Servlet module I thought it appropriate to provide some background on the FF Application Container or as I commonly refer to it, the engine. The engine is really just an NIO server that has support for pluggable protocol handlers (and modules which I’ll cover next). The basic operations of the engine are very straightforward which drill down to read (and write) data as quickly as possible and chunk it to the protocol handler. The FF HTTP protocol handler is event based (think node.js) and will continue consuming data until it detects that it has received a full HTTP request, at which point it publishes it to subscribers.

Once the protocol handler detects that it has received a complete HTTP request, it is published  and one of the subscribers is a module delegator. The module delegator is responsible for delegating the request to the module that is managing the application. A module is essentially a software stack (i.e., Servlet) that is responsible for executing the application. So there is a complete decoupling of the network I/O and application execution. A given engine can host multiple modules (i.e., NoServer, Ruby, Servlet, etc.) and a module can host multiple applications. For PaaS applications (i.e., Ruby, Servlets) an engine will typically host one module which will host one application and the engine will run within an LXC container for security reasons. Currently modules are run in the same JVM as the engine, however, that may change in the future so that a single NIO engine can publish requests to multiple modules that reside in their own LXC containers on the same VM or on other VMs.

FatFractal (FF) Servlet Module vs Conventional Servlet Container

The FF Servlet module is a very lightweight Servlet Container that supports the typical things you would expect such as JSPs, listeners, filters, and etc. It was purposely not designed to be a full blown Servlet container like Tomcat and is available for developers that want to use the framework to implement their server side functionality. The biggest difference between conventional Servlet containers and the FF Servlet module is that the network I/O has been decoupled from the framework. A typical Servlet application can get access to the socket streams through the HttpServletRequest and HttpServletResponse objects. The FF Servlet module provides access to those streams but the streams are implemented as encapsulations around buffers. This decoupling of I/O from the frameworks is what allows a single engine to support a truly polyglot environment and will allow FF to extend its language/framework support using a single software stack.

TechEmpower

As previously mentioned the TechEmpower folks have constructed a test suite that consists of tests that exercise different aspects of the frameworks. This article will employ the following three tests:

  1. JSON serialization
  2. Database access (single query)
  3. Database access (multiple query)

JSON serialization

In this test, each HTTP response is a JSON serialization of a freshly-instantiated object, resulting in {“message” : “Hello, World!”}.

Database access (single query)

How many requests can be handled per second if each request is fetching a random record from a data store?

Database access (multiple query)

The following tests are all run at 256 concurrency and vary the number of database queries per request. The tests are 1, 5, 10, 15, and 20 queries per request.

This article will use the same client (WeigHTTP) that TechEmpower used and the same EC2 configuration. TechEmpower typically tests on both EC2 and on dedicated hardware, unfortunately :-( I don’t have the latter and will only perform the tests on EC2.

While this article will only be comparing the FF Servlet module and Tomcat, the results can also be compared to the TechEmpower framework results since the same tests, client, and EC2 configuration are being duplicated.

Okay over and out and see you soon with the results.

Big Data, Hubi, and Beer

Okay this blog has nothing to with beer. But hey! I had to get you here somehow.

This blog article focuses on how FatFractal (FF) uses big data and why it is important to the developer. There are lots of ways to collect this kind of data and extrapolate meaning from it. At FF we designed analytics into the platform from day one so that we could generate, store, and mine data using common big data technologies such as Hadoop, MapReduce, Hive, Pig, Flume, and Cassandra. The data that is ultimately stored comes from conventional sources such as logs, infrastructure services such as CloudWatch but also from our instrumented application container which provides a real time view into what is really happening with the applications. Our goal is to ultimately provide developers with the tools and information they need to effectively manage and monitor their application’s compute usage.

At FatFractal (FF) we use big data primarily for:

  • Billing - FF charges developers for their compute consumption and all usage metrics are ultimately stored into the FF Hadoop cluster and at the end of each respective developer’s monthly billing cycle the data is MapReduced into billing records that are stored into Cassandra.
  • Usage Profile - All applications that are deployed to FF have Usage Profiles (UP) constructed for them. The UP represents a set of compute constraints based upon  either a subscription (BaaS) or a number of FatFractal Virtual Spaces (FFVS, an FFVS is a custom LXC container) and services (i.e., database) (PaaS). If the application compute usage consistently approaches or exceeds the thresholds of the UP the developer is notified so that they have an opportunity to upgrade their subscription or allocate additional FFVSs.
  • Application Analytics Service - FF provides analytic reporting for all applications that can be accessed from the FF console. This allows the developer to track their application’s compute usage. Ultimately the goal of this service is to provide the developer with the information and tools to truly monitor and manage their application’s compute usage.

This blog article will focus primarily on the UP and analytics in the context of provisioning properly for an application. In addition it will cover scheduled scaling based on a real world application (Hubi) that is currently deployed on the FF infrastructure.

Application Compute Usage Metrics

This section is included to provide the reader with background information on why and how FF collects application computer usage metrics.

Planning and scaling in multi-tenanted environments is challenging because you don’t know what applications are actually consuming the resources unless you have baked in the necessary instrumentation. When an instance hits say 80% CPU utilization, the simplest thing to do is clone all the applications onto a newly minted instance and then add it to the load balanced mix (which is what the FF traffic directors do). However, if you can identify the pertinent application(s) you simply need to clone that/those application(s) onto existing under utilized instances or spin up a new one (matching the compute needs i.e., EC2 m1.small) and let the FF traffic directors do their job. The type of data you would need to assess each respective application’s compute usage are things like; 1) CPU milliseconds consumed per time, 2) request and response counts/sizes per time, 3) memory consumption (this one is kind of hazy but a relative number can be arrived at) per time, and 4) etc. You then compare the numbers and zero in on which applications are consuming the most compute. It may well be a situation where the instance is oversubscribed and the applications need to be segmented on to different instances. At FF we collect instance level compute usage from the infrastructure services (i.e., CloudWatch) which tells us what is going on with the instance. For application level compute usage we rely on metrics that are generated by the FF application container. FF uses a custom application container (think Google App Engine) to facilitate the deployment and execution of all applications independent of their type (i.e., NoServer, R-o-R, Servlets, etc.). The FF application container has been instrumented to generate compute usage metrics in real time and ultimately propagates them to a Hadoop cluster. It should also be mentioned that the application containers reside in customized paravirtualized containers (LXC/FFVS) that are each assigned a slice (i.e, 1 proc) of the instance’s compute resources. The diagram below provides a high level view of the application container.

Usage Profile (UP)

Ok we now know how application compute usage metrics are generated, next lets look at how those analytics can be leveraged.

When an application is deployed to the FF infrastructure, nothing is known about it compute usage requirements. The UP dictates the compute thresholds which may be an indicator (i.e., the developer signs up for a bronze subscription knowing the associated compute quotas match the application’s compute usage closely) however most green field applications are undersubscribed with significant headroom to grow.

Compute provisioning for BaaS and PaaS applications is defined differently. BaaS developers sign up for a specific subscription which defines what the quotas for the UP are. PaaS developers explicitly choose how many FFVSs their application should be deployed to and what services it will be using.

While the FFVS compute quotas are published it remains difficult to accurately specify precisely how much compute a PaaS application is going to need, especially if it starts out as a high volume application (i.e., a migration from another service). If the PaaS application is oversubscribed auto-scaling will mitigate the situation, however, this use case is not what auto-scaling was designed for and is not optimal from a cost or provisioning perspective.

   

Most applications that are deployed to FatFractal are green field apps that typically have little to no load to start with. With these types of applications there is sufficient lead time where analytics can be collected and reconciled against the UP. Once compute usage has hit certain thresholds the developer is notified that they should upgrade their subscription or they should allocate additional FFVSs.

There is another class of application that is not a greenfield but rather a migration from another platform (i.e., Google App Engine) that may generate huge amounts of load once the switch is fully flipped. If the developer knows the compute characteristics (i.e., the app requires 2*2.6 Ghz worth of CPU) of the application then it is relatively straightforward to formulate a reasonable UP, however, this is generally not the case The challenge with this situation is to define a UP with  a sufficient amount of compute up front to accommodate the compute usage needs of the application but not over charge the developer or impact the users of the application.

This can be done three ways:

  1. By over provisioning and collecting application compute usage metrics over some period of time and later making the deployment adjustments and redefining the UP.
  2. By under provisioning and collecting the usage analytics over some period of time and relying on auto-scaling to mitigate spikes in load and later making deployment adjustments and redefining the UP.
  3. By provisioning minimal compute (i.e., a single FFVS) and having the developer partially open the spigot and collecting usage analytics over some period of time and later making the deployment adjustments and redefining the UP based on some multiple of the number of requests for a certain corpus of users over a period of time.

With all three options it is preferable to work closely with the developer which FF recommends and generally does. Ultimately the goal is to provide the developer with the information and tools they need to do it themselves.

All three methodologies will work and are optimum given certain use cases. IMHO option 3 is the preferred way to do it but unfortunately not all (very few) migration scenarios are actually in a position to take advantage of this approach. Independent of the which methodology is employed application compute usage metrics are critical to scoping the final UP and making adjustments to it over time.

Next I will cover a real use case where option 3 was employed.

Introducing Hubi

Hubi is a very cool mobile application that was developed by Megadevs. It is available both on Android and iOS and has 500,000+ users all over the globe. The application was recently awarded the best movie streaming Android app by heavy.com.

The Servlet back-end for the application was originally deployed onto a hosting provider. FF was approached by one of the Megadevs developers (Dario Marcato …a great guy BTW) at AppsWorld  2013 to discuss migrating Hubi from the hosting provider to FF, a couple months later the journey began.

Hubi  generates significant request load but is unique in that spikes every day at about the same time and is CPU bound based on the request type. Below is a table that shows the number of requests, users, and cpu seconds (which won’t mean too much yet) per month since 05/12/2013 thru 09/17/2013 to give you an idea of what its load is.

Month

Users (unique)

Requests

CPU (seconds)

May

20,842

343,994

307,839.533

June

165,781

2,706,665

1,841,253.982

July

412,126

7,381,020

5,008,210.832

August

491,571

9,644,024

8,915,138.303

September

205,124

5,571,600

3,975,262.658

Hubi was originally provisioned onto one FFVS and for the month of May where there were a limited number of users and things worked out fine. We profiled the compute usage with the analytics we had collected up to this point and formulated a UP based on the full corpus of users (approximately 500,000).  We provisioned for that UP and for twenty one (21) hours a day things went smooth but between the hours of 1pm PST and  3pm PST we would experience load issues where the instance CPU usage would hit 80%+  and result in request timeouts. We then profiled Hubi on an hourly basis across the month of June with the analytics we had collected and observed a spike that occurred every day between the hours of 1pm PST and 3PM PST. At this point we could simply adjust the UP and add n-number of more FFVSs where the compute usage for the additions would be used approximately three (3) hours a day. While the aforementioned plan was simple it was not palatable from a cost perspective since there is an incremental cost associated with each additional FFVS. So we decided to leverage an FF scaling feature where we predictively spin up n-number of FFVSs at a scheduled time and then tear them down once the time allotment has been hit. We then charge for the cummulative hours which effectively amounted to the addition of one (1) FFVS. Hubi has been running with this UP for approximately 1.5 months and there have been no load issues.

I apologize in advanced for the diagram below, I am still ramping up on the nuances of RRDTool (which I do like). The Y-axis is the number of CPU seconds and request counts. The X-axis is the hours on 7/31/2013. The times on the X-axis are UTC times (or 8 hours head of PST). If I were to provide a diagram for every day it would be a carbon copy of what you see in the diagram below. You’ll observe that at about 19:00 (1pm PST) things really start to ramp up. The traffic is effectively a combination of two request types, one of which is extremely CPU intensive. This specific request type ultimately drives the CPU seconds above the request count. This can be specifically attributed to the request type distribution. You’ll notice that in most of the hours the CPU seconds consumed is well below that of the request count. That is because very few users are actually invoking the culpable request type.

The spikes in CPU seconds at 7:00 and 9:00 are not normal and I am still running mapreduce jobs to try and understand that data at this writing. The bad news is I am unsure what they represent but the goods new is I would not even be aware of them unless we had application compute usage metrics.

So at the end of the day we were able to minimize megadevs cost by formulating a UP that represents the load for twenty one (21) hours plus the addition of one (1) FFVS to accommodate the load between 1pm PST and 3PM PST

 

 

Conclusion

At FF we knew application compute usage metrics would be necessary for bookkeeping activities such as billing and intuitively we believed that these analytics would be critical to scaling and managing applications. Hubi and a couple of other applications have validated those assumptions and now the challenge is to deliver the information and tools to the developers so that they can get a highly granular viewport into their applications and can scale and manage them in the most informed manner possible.

Telluride and Application Analytics

Well the Telluride Film Festival is winding down today (Monday 09/02/2013) and now it is time to mine the compute usage analytics that we have collected.

Background

This is the second year that FatFractal has hosted the Telluride back-end and things went very smoothly, due in large part  to planning based on last year’s fuzzy compute usage analytics. Last year we had no idea what to expect in terms of load and sat on the edges of our collective devops seats as we watched the traffic increase each day. The Telluride Film Festival is a five day event that builds up momentum across the week as film enthusiasts arrive to the festival. FatFractal (specifically Dave Wells) worked in conjunction with Pete Nies to develop clients for iOS, Android, and the browser that provide functionality that truly helps the film goer optimize their Telluride experience.

Some example functionality is:

  • Seat availability.
  • Book signing schedule.
  • Film schedule.
  • Guest directors.

Below are some iOS and Android screenshots:

Android

iOS

The Telluride back-end data must be updated periodically across the five days on live production systems and that load must be factored into the planning. The data (like most data models) resides in multiple collections and consists of both objects (JSON) and blobs that are related in some manner which is facilitated through really cool FatFractal NoServer features. In addition the Telluride back-end easily integrated with Salesforce (for seating availability) using the NoServer Server Extensions

Planning

Last year we served the Telluride back-end up off a heavily multi-tenanted EC2 m1.xlarge instance and it did the job. This year we served the Telluride back-end off of two heavily multi-tenanted EC2 m1.large instances for redundancy purposes and the traffic was load balanced to the instances by our directors. We have far more apps on the platform now than we did this time last year. So we figured given the normal loads on the two EC2 m1.large instances and last year’s Telluride loads (wish we had real analytics back then) we should be able to accommodate this years load with some head room (fingers crossed).

Below are some screenshots of the instance loads. The two instances are represented by the green and blue lines. The Telluride film festival started 08/29/2013 and ended 09/02/2013.

It should be noted that at the start of the festival (see the spikes) we uncovered a bug that affected CPU utilization (that ever elusive monster query) that was fixed in about an hour by our resident guru, Gary Casey.

CPU

Network-In

Network-Out

As you can see the instances easily handled the load and our assumptions based on last year’s fuzzy compute usage analytics were somewhat validated. Unfortunately last year we were not collecting application-level metrics and relied heavily on information from our logs and extrapolated what we could. Given the graphs above we could have squeezed more out of the instances but  without fine-grained analytics we did not want to take the risk.

Application Analytics

What application compute usage analytics allow us to do is determine what percentage of an instance’s load is being consumed by each respective application on that instance. So if I wanted to scale the application to another instance, I would know approximately how much compute must be available on that instance or take the simplistic route and spin up the appropropriate EC2 instance type.

In the table below you can see two applications, telluride and an unnamed app we’ll call ‘anon’, they both reside on the same instance. The metrics have been collected across the dates 08/29/2013-09/02/2013. So I have their relative compute usage and can determine how much they contribute to the total instance load.

I should note that the telluride aggregates have actually been collected across two instances but the analytic records contain the instance id so that I can aggregate across one or more instances, the table is just an example of what application analytics get collected.

Below is a screenshot of the Telluride API calls and response times from 08/29/2013-09/02/2013 across both EC2 m1.large instances. Application analytics provides a fine grained view port that can be drilled down on to determine precisely what the compute usage of any application is.

Conclusion

Application analytics is critical to any multi-tenanted BaaS or PaaS environment. It is the information necessary to accurately profile an application’s  compute usage such that it can be properly scaled in a predictive fashion. In addition application analytics is the tool by which the infrastructure can be utilized in the most efficient manner possible allowing for optimal multi-tenancy and ultimately a lower cost to the developer and enterprise.

We will be far more informed for next year’s Telluride Film Festival with the application analytics we captured this year and really look forward to next year’s event!

 

 

Contact