OpenStack Benchmarking on SoftLayer with Rally

Kirill Ishanov - December 10, 2013

As we recently posted, Mirantis and IBM SoftLayer collaborated to benchmark OpenStack. A little while ago, right before the Icehouse Summit, Mirantis released a project called Rally, a tool for OpenStack benchmarking. Mirantis has been collaborating with SoftLayer for the last three to four months to benchmark OpenStack on a larger scale using Rally. As a result of this collaboration, we’ve prepared a solution that allows us to specify real-world workloads and can be run against existing OpenStack clouds.

This article takes a look at the setup for this benchmarking, as well as specifics of the actual use case.

IBM SoftLayer is a global hosting leader and has a large number of servers that can be used for different kinds of workloads. As you know, Mirantis has been around for a while, concentrating primarily on OpenStack deployment and operations for the last three years.

Who is Rally for?

Rally is a solution designed to benchmark and profile OpenStack-based clouds and it’s made for both developers and operators. When it comes to developers, the tools allow them to specify some kind of synthetic workload to stress-test OpenStack clouds and get the low-level profiling results. What Rally does in that case is collect all of the nitty-gritty details about running specific things, like provisioning a thousand VMs, for example, and see how on average a cloud performs in that environment.

On the other hand, if you think of operators, they typically don’t run the workloads, like provision and immediately destroy VMs. So, their workloads are more complicated, and we’ve created an engine that allows us to specify real-life workloads and runs on existing OpenStack clouds. The results that we generate from this kind of benchmark are more high level but allow you to identify bottlenecks on your cloud. For example, if something is wrong with the database, and it also saves some sort of historical data, you can see if you’re getting better or worse with the changes that you’ve applied to your clouds.

The setup

The setup for the benchmark is very simple. We’ve been using DevStack to deploy the latest from OpenStack trunk as well as Mirantis Fuel for larger deployment in an HA cloud. IBM SoftLayer has provided us with 1500 baremetal servers. We created several pools of these servers and deployed several clouds to see how OpenStack behaves on different scales. The biggest cloud was almost one thousand nodes.

Before I dive into the demo we did (which you can check out here), it’s important to note something about SoftLayer. Typically, when you deploy OpenStack clouds, you can perform baremetal provisioning through a tool like Cobbler or something else. What we needed to do is integrate the SoftLayer API into our solution called Fuel so we could provision hundreds and thousands of servers on demand through SoftLayer.

A real-world use case

In the real world, companies have several applications that they want to deploy within several different usage patterns. Suppose you have a complex web application with a web layer, cache, load balancer, and a database. If you have multiple concurrent teams developing this application, they’d probably be deploying different versions of this application on the cloud several times a day. And, if this is primarily for Dev/QA, this environment would not live for a long time: you deploy it, use it for several minutes or hours, you get the results, and you get this environment destroyed.

Therefore, if you have a large environment, then you have several teams around a bunch of standard stack applications, and each application contains a lot of VMs. To translate these requirements and workload into OpenStack, you would need a number of tenants, users per tenant, and a number of VMs you’d be provisioning concurrently. In terms of steps, this use case goes through provisioning, using, and destroying VMs. If you want to benchmark OpenStack, it only participates in the first and the last step. For those two steps, you need to identify how long on average it takes to provision as well as the success rate.

Technically, this becomes the baseline for your SLA. However, OpenStack is not a monolithic structure. It’s a distributed system that contains a bunch of daemons talking to each other. In our case, provisioning a VM requires for the request to go through the nova API, then probably go through Keystone for authentication, then to the nova database, and so on.

You want to know where you spend most of the time during this provisioning and you want to build a solution that would allow you to collect that data. When you run this for the first time, it gives you a baseline of how your cloud performs. As soon as you get the baseline, you want to be able to provide some historical data. For example, if you run the same benchmark after you’ve changed something on the cloud--such as database configuration or Glance caching option--you want to be able to see if your changes are helping you get the overall time below your SLA. That brings us to our demo.

The application

Our application is the front end for Rally deployed on a local server with an access to SoftLayer servers. We’ve deployed several environments. Each of the environments contains information about the benchmarks that have been run there. As the demo shows, these benchmarks have been specified to two workloads called “Creative Testing” and “Report Generation.”

The workflow

If you want to do some testing on different things in parallel, you need to provision your entire web application and you need to specify the workflow saying you’d like to provision it, use it for some time, and destroy it. For report generation, you probably need to run something several times a day, but it would require you to deploy and run a large cluster on the results, run the workload, and destroy everything.

Every company has their own workloads, and the uniqueness can come from either the concurrency pattern, or the application itself, or the steps that you take. We made a little more sophisticated workload called Dev/QA. It begins by provisioning the application, and then assumes you use this for a while, let’s say 20 minutes. Then, if you’ve made some changes in the database, it takes a snapshot for further verification. And, then, it assumes you want to use it for a while, for 5 more minutes, just to collect some data, and after that, it destroys the VMs.

The wizard

Therefore, you can specify most of the steps that are available on OpenStack through the Rally wizard. As soon as you specify the workflow, you can switch to the usage pattern and say that the cloud will be used by, say, 5 tenants, that each tenant will have at least 3 users, and that these users will use this 2 times per hour. After that, you would go to the usage topology.

Initially, we wanted to rely on Heat application templates but decided to allow users to specify a very simple application topology by defining roles of servers. As soon as you specify the workflow, you can create the workload in the Rally wizard. After that, you can go to the clusters, select a cluster and run a benchmark on your workflow (for our example, it could be a baseline for Dev/QA). Technically, a workflow takes about 40 minutes, but in the real world, you’d want to repeat this step multiple times. So, then you would specify how much time you want to spend on each step and run the benchmark.

The benchmark

If you go in the details for each benchmark in Rally, you can see how much time you spend on each of the OpenStack components so you can make adjustments and rerun the benchmark for comparison. A future version of Rally will include SLA-related information about how your cloud should perform on average and also calculate the maximum time.

Want to join us?

Are you an OpenStacker intrigued by Rally? Then, we want you to join our team! Check out the following resources and post your comments. We’d love to hear your feedback.

Resources: