| By Application Security | Article Rating: |
|
| January 29, 2013 09:00 AM EST | Reads: |
3,064 |
As an Enterprise Architect for Intel IT, I worked with IT Engineering and our Software and Services group on the elastic scaling of the APIs that power the Intel AppUp® center. Our goal was to scale our APIs to at least 10x our baseline capacity (measured in transactions per second) by moving them to our private cloud, and ultimately to be able to connect to a public cloud provider for additional availability and scalability. Here’s a quick set of practices we used to achieve our goal:
- Virtualize everything. This may seem obvious and is probably a no-op for new APIs, but in our case we were using a bare-metal installs at our gateway and database layers (the API servers themselves were already running as VMs). While our gateway hardware appliance had very good scalability, we knew we were ultimately targeting the public cloud and that our need for dynamic scaling could exceed our ability to add new physical servers. Using a gateway that scales in pure software virtual machines without the need for special purpose-built hardware helped us achieve our goal here.
- Instrument everything. We needed to be able to correlate leading indicators like transactions per second to system load at each layer so we could begin to identify bottlenecks. We also needed to characterize our workload for testing – understanding a real-world sequence of API methods and mix/ordering of reads and writes. This allowed us to create a viable set of load tests.
- Identify bottlenecks. We used Apache jmeter to generate load and identify points where latency became an issue, correlating that against system loads to find out where we had reached saturation and needed to scale.
- Define a scaling unit. In our case, we were using dedicated DB instances rather than database-as-a-service, so we decided to scale all three layers together. We identified how many API servers would saturate the DB layer, and how many gateways we would need to manage the traffic. We then defined a collection of VMs that would provision all of these VMs together. We might have scaled each layer independently had our API been architected differently, or if we were building from scratch on database-as-a-service.
- Repeat. The above let us scale from 1x to about 5x or 6x without any problem. However, when we hit 6x scaling we discovered that a new bottleneck: the overhead of replicating commits across the database instances. We went back to the drawing board and redesigned the back end for eventual consistency so we could reduce database load.
- Automate everything. We use Nagios and Puppetto monitor and respond to health changes. A new scaling unit is provisioned when we hit predefined performance thresholds.
- Don’t forget to test scaling down. If you set a threshold for removing capacity, it’s important to make sure that your workflow allows for a graceful shutdown and doesn’t impact calls that are in progress.
The above approach got us to 10x our initial capacity in a single data center. Because of some of our architecture decisions (coarse-grained scaling units and eventual consistency) we were then able to add a GLB and scale out to multiple data centers – first to another internal private cloud and then to a public cloud provider.
Read the original blog entry...
Published January 29, 2013 Reads 3,064
Copyright © 2013 SYS-CON Media, Inc. — All Rights Reserved.
Syndicated stories and blog feeds, all rights reserved by the author.
More Stories By Application Security
This blog references our expert posts on application and web services security.
- Cloud People: A Who's Who of Cloud Computing
- NIST to Sponsor FFRDC Widespread Adoption of Integrated CyberSecurity
- Cloud Business Solutions, Social Media, and Platform Systems of Engagement Market Shares, Strategies, and Forecasts, Worldwide, 2013 to 2019
- MicroStrategy Announces General Availability of MicroStrategy 9.3.1
- MicroStrategy Announces General Availability of MicroStrategy 9.3.1
- Cloud Expo New York | Big Data: What It Means for Legal & Risk Management
- Altova Announces General Availability of RaptorXML
- Reflections on the Future of Platform as a Service (PaaS)
- Big Data Will Revolutionize Learning
- Cloud Expo New York: Getting to the Promise of Big Data
- 2013 - 2016 : solutions stabilisées, usages innovants généralisés
- Cloud Expo New York: Cloud Architecture and Engineering
- Cloud People: A Who's Who of Cloud Computing
- Portable Experimenter’s Platform, Powered by Raspberry Pi
- Predixion Software Announces General Availability of the Latest Version of its Predictive Analytics Platform
- Cloud Expo New York: Real-Time Analytics Using an In-Memory Data Grid
- Cloud Expo New York: The Big Challenge of Big Data & Hadoop Integration
- NIST to Sponsor FFRDC Widespread Adoption of Integrated CyberSecurity
- Cloud Business Solutions, Social Media, and Platform Systems of Engagement Market Shares, Strategies, and Forecasts, Worldwide, 2013 to 2019
- Agile Solutions for Cloud, Big Data, Mobility Services
- MicroStrategy Announces General Availability of MicroStrategy 9.3.1
- Cloud Computing: Cutting Costs, Boosting Profits
- AMAX Launches StorMax(TM) CFS, powered by IBM(R) General Parallel File System(TM) (GPFS(TM))
- Benefits of Cloud Computing
- The Top 250 Players in the Cloud Computing Ecosystem
- Web Services Using ColdFusion and Apache CXF
- Cloud People: A Who's Who of Cloud Computing
- Red Hat Named "Platinum Sponsor" of Virtualization Conference & Expo
- Cloud Expo New York Call for Papers Now Open
- Eclipse "Pollinate" Project to Integrate with Apache Beehive
- An Introduction to Ant
- Cloud Expo 2011 East To Attract 10,000 Delegates and 200 Exhibitors
- Beehive Code Now Available in Apache
- 4th International Cloud Computing Conference & Expo Starts Today
- Apache's Tomcat 5.5 is First Release Ever to Use Eclipse JDT Java Compiler
- "Beehive" Now Officially an Open Source Project: Apache Beehive


























