Apache Authors: Pat Romanski, Liz McMillan, John Mertic, Elizabeth White, Janakiram MSV

Related Topics: Cloud Security, Java IoT, Linux Containers, Containers Expo Blog, Agile Computing, @CloudExpo

Cloud Security: Blog Feed Post

Facebook Exploit Is Not Unique

Facebook isn't unique in the ability to use it to attack a third party, it's just more effective

This week's "bad news" with respect to information security centers on Facebook and the exploitation of HTTP caches to affect a DDoS attack. Reported as a 'vulnerability', this exploit takes advantage of the way the application protocol is designed to work. In fact, the same author who reports the Facebook 'vulnerability' has also shown you can use Google to do the same thing. Just about any site that enables you to submit content containing links and then retrieves those links for you (for caching purposes) could be used in this way. It's not unique to Facebook or Google, for that matter, they just have the perfect environment to make such an exploit highly effective.

The exploit works by using a site (in this case Facebook) to load content and takes advantage of the general principle of amplification to effectively DDoS a third-party site. This is a flood-based like attack, meaning it's attempting to overwhelm a server by flooding it with requests that voraciously consume server-side resources and slow everyone down - to the point of forcing it to appear "down" to legitimate users.

The requests brokered by Facebook are themselves 110% legitimate requests. The requests for an image (or PDF or large video file) are well-formed, and nothing about the requests on an individual basis could be detected as being an attack. This is, in part, why the exploit works: because the individual requests are wholly legitimate requests.

How it Works
The trigger for the "attack" is the caching service. Caches are generally excellent at, well, caching static objects with well-defined URIs. A cache doesn't have a problem finding /myimage.png. It's either there, or it's not and the cache has to go to origin to retrieve it. Where things get more difficult is when requests for content are dynamic; that is, they send parameters that the origin server interprets to determine which image to send, e.g. /myimage?id=30. This is much like an old developer trick to force the reload of dynamic content when browser or server caches indicate a match on the URL. By tacking on a random query parameter, you can "trick" the browser and the server into believing it's a brand new object, and it will go to origin to retrieve it - even though the query parameter is never used. That's where the exploit comes in.

HTTP servers accept as part of the definition of a URI any number of variable query parameters. Those parameters can be ignored or used at the discretion of the application. But when the HTTP server is looking to see if that content has been served already, it does look at those parameters. The reference for a given object is its URL, and thus tacking on a query parameter forces (or tricks if you prefer) the HTTP server to believe the object has never been served before and thus can't be retrieved from a cache.

Caches act on the same principles as an HTTP server because when you get down to brass tacks, a cache is a very specialized HTTP server, focused on mirroring content so it's closer to the user.

<img src=http://target.com/file?r=1>
<img src=http://target.com/file?r=2>
<img src=http://target.com/file?r=3>
<img src=http://target.com/file?r=1000>

Many, many, many, many (repeat as necessary) web applications are built using such models. Whether to retrieve text-based content or images is irrelevant to the cache. The cache looks at the request and, if it can't match it somehow, it's going to go to origin.

Which is what's possible with Facebook Notes and Google. By taking advantage of (exploiting) this design principle, if a note crafted with multiple image objects retrieved via a dynamic query is viewed by enough users at the same time, the origin can become overwhelmed or its network oversubscribed.

This is what makes it an exploit, not a vulnerability. There's nothing wrong with the behavior of these caches - they are working exactly as they were designed to act with respect to HTTP. The problem is that when the protocol and caching behavior was defined, such abusive behavior was not considered.

In other words, this is a protocol exploit not specific to Facebook (or Google). In fact, similar exploits have been used to launch attacks in the past. For example, consider some noise raised around WordPress in March 2014 that indicated it was being used to attack other sites by bypassing the cache and forcing a full reload from the origin server:

If you notice, all queries had a random value (like “?4137049=643182″) that bypassed their cache and force a full page reload every single time. It was killing their server pretty quickly.


But the most interesting part is that all the requests were coming from valid and legitimate WordPress sites. Yes, other WordPress sites were sending that random requests at a very large scale and bringing the site down.

The WordPress exploit was taking advantage of the way "pingbacks" work. Attackers were using sites to add pingbacks to amplify an attack on a third party site (also, ironically, a WordPress site).

It's not just Facebook, or Google - it's inherent in the way caching is designed to work.

Not Just HTTP
This isn't just an issue with HTTP. We can see similar behavior in a DNS exploit that renders DNS caching ineffective as protection against certain attack types. In the DNS case, querying a cache with a random host name results in a query to the authoritative (origin) DNS service. If you send enough random host names at the cache, eventually the DNS service is going to feel the impact and possibly choke.

In general, these types of exploits are based on protocol and well-defined system behavior. A cache is, by design, required to either return a matching object if found or go to the origin server if it is not. In both the HTTP and DNS case, the caching services are acting properly and as one would expect.

The problem is that this proper behavior can be exploited to affect a DDoS attack - against third-parties in the case of Facebook/Google and against the domain owner in the case of DNS.

These are not vulnerabilities, they are protocol exploits. This same "vulnerability" is probably present in most architectures that include caching. The difference is that Facebook's ginormous base of users allows for what is expected behavior to quickly turn into what looks like an attack.

The general consensus right now is the best way to mitigate this potential "attack" is to identify and either rate limit or disallow requests coming from Facebook's crawlers by IP address. In essence, the suggestion is to blacklist Facebook (and perhaps Google) to keep it from potentially overwhelming your site.

The author noted in his post regarding this exploit that:

Facebook crawler shows itself as facebookexternalhit. Right now it seems there is no other choice than to block it in order to avoid this nuisance.

The post was later updated to note that blocking by agent may not be enough, hence the consensus on IP-based blacklisting.

The problem is that attackers could simply find another site with a large user base (there are quite a few of them out there with the users to support a successful attack) and find the right mix of queries to bypass the cache (cause caches are a pretty standard part of a web-scale infrastructure) and voila! Instant attack.

Blocking Facebook isn't going to stop other potential attacks and it might seriously impede revenue generating strategies that rely on Facebook as a channel. Rate limiting based on inbound query volume for specific content will help mitigate the impact (and ensure legitimate requests continue to be served) but this requires some service to intermediate and monitor inbound requests and, upon seeing behavior indicative of a potential attack, the ability to intercede or apply the appropriate rate limiting policy. Such a policy could go further and blacklist IP addresses showing sudden increases in requests or simply blocking requests for the specified URI in question - returning instead some other content.

Another option would be to use a caching solution capable of managing dynamic content. For example, F5 Dynamic Caching includes the ability to designate parameters as either indicative of new content or not. That is, the caching service can be configured to ignore some (or all) parameters and serve content out of cache instead of hammering on the origin server.

Let's say the URI for an image was: /directory/images/dog.gif?ver=1;sz=728X90 where valid query parameters are "ver" (version) and "sz" (size). A policy can be configured to recognize "ver" as indicative of different content while all other query parameters indicate the same content and can be served out of cache. With this kind of policy an attacker could send any combination of the following and the same image would be served from cache, even though "sz" is different and there are random additional query parameters.

/directory/images/dog.gif?ver=1;sz=728X90; id=1234
/directory/images/dog.gif?ver=1;sz=728X900; id=123456
/directory/images/dog.gif?ver=1;sz=728X90; cid=1234 

By placing an application fluent cache service in front of your origin servers, when Facebook (or Google) comes knocking, you're able to handle the load.

Action Items
There have been no reports of an attack stemming from this exploitable condition in Facebook Notes or Google, so blacklisting crawlers from either Facebook or Google seems premature. Given that this condition is based on protocol behavior and system design and not a vulnerability unique to Facebook (or Google), though, it would be a good idea to have a plan in place to address, should such an attack actually occur - from there or some other site.

You should review your own architecture and evaluate its ability to withstand a sudden influx of dynamic requests for content like this, and put into place an operational plan for dealing with it should such an event occur.

For more information on protecting against all types of DDoS attacks, check out a new infographic we’ve put together here.

Read the original blog entry...

More Stories By Lori MacVittie

Lori MacVittie is responsible for education and evangelism of application services available across F5’s entire product suite. Her role includes authorship of technical materials and participation in a number of community-based forums and industry standards organizations, among other efforts. MacVittie has extensive programming experience as an application architect, as well as network and systems development and administration expertise. Prior to joining F5, MacVittie was an award-winning Senior Technology Editor at Network Computing Magazine, where she conducted product research and evaluation focused on integration with application and network architectures, and authored articles on a variety of topics aimed at IT professionals. Her most recent area of focus included SOA-related products and architectures. She holds a B.S. in Information and Computing Science from the University of Wisconsin at Green Bay, and an M.S. in Computer Science from Nova Southeastern University.

@ThingsExpo Stories
SYS-CON Events announced today that Pulzze Systems will exhibit at the 19th International Cloud Expo, which will take place on November 1–3, 2016, at the Santa Clara Convention Center in Santa Clara, CA. Pulzze Systems, Inc. provides infrastructure products for the Internet of Things to enable any connected device and system to carry out matched operations without programming. For more information, visit http://www.pulzzesystems.com.
One of biggest questions about Big Data is “How do we harness all that information for business use quickly and effectively?” Geographic Information Systems (GIS) or spatial technology is about more than making maps, but adding critical context and meaning to data of all types, coming from all different channels – even sensors. In his session at @ThingsExpo, William (Bill) Meehan, director of utility solutions for Esri, will take a closer look at the current state of spatial technology and ar...
Cloud based infrastructure deployment is becoming more and more appealing to customers, from Fortune 500 companies to SMEs due to its pay-as-you-go model. Enterprise storage vendors are able to reach out to these customers by integrating in cloud based deployments; this needs adaptability and interoperability of the products confirming to cloud standards such as OpenStack, CloudStack, or Azure. As compared to off the shelf commodity storage, enterprise storages by its reliability, high-availabil...
The IoT industry is now at a crossroads, between the fast-paced innovation of technologies and the pending mass adoption by global enterprises. The complexity of combining rapidly evolving technologies and the need to establish practices for market acceleration pose a strong challenge to global enterprises as well as IoT vendors. In his session at @ThingsExpo, Clark Smith, senior product manager for Numerex, will discuss how Numerex, as an experienced, established IoT provider, has embraced a ...
In past @ThingsExpo presentations, Joseph di Paolantonio has explored how various Internet of Things (IoT) and data management and analytics (DMA) solution spaces will come together as sensor analytics ecosystems. This year, in his session at @ThingsExpo, Joseph di Paolantonio from DataArchon, will be adding the numerous Transportation areas, from autonomous vehicles to “Uber for containers.” While IoT data in any one area of Transportation will have a huge impact in that area, combining sensor...
In the next forty months – just over three years – businesses will undergo extraordinary changes. The exponential growth of digitization and machine learning will see a step function change in how businesses create value, satisfy customers, and outperform their competition. In the next forty months companies will take the actions that will see them get to the next level of the game called Capitalism. Or they won’t – game over. The winners of today and tomorrow think differently, follow different...
“Media Sponsor” of SYS-CON's 19th International Cloud Expo, which will take place on November 1–3, 2016, at the Santa Clara Convention Center in Santa Clara, CA. CloudBerry Backup is a leading cross-platform cloud backup and disaster recovery solution integrated with major public cloud services, such as Amazon Web Services, Microsoft Azure and Google Cloud Platform.
The Internet of Things (IoT), in all its myriad manifestations, has great potential. Much of that potential comes from the evolving data management and analytic (DMA) technologies and processes that allow us to gain insight from all of the IoT data that can be generated and gathered. This potential may never be met as those data sets are tied to specific industry verticals and single markets, with no clear way to use IoT data and sensor analytics to fulfill the hype being given the IoT today.
Ask someone to architect an Internet of Things (IoT) solution and you are guaranteed to see a reference to the cloud. This would lead you to believe that IoT requires the cloud to exist. However, there are many IoT use cases where the cloud is not feasible or desirable. In his session at @ThingsExpo, Dave McCarthy, Director of Products at Bsquare Corporation, will discuss the strategies that exist to extend intelligence directly to IoT devices and sensors, freeing them from the constraints of ...
SYS-CON Events announced today that SoftNet Solutions will exhibit at the 19th International Cloud Expo, which will take place on November 1–3, 2016, at the Santa Clara Convention Center in Santa Clara, CA. SoftNet Solutions specializes in Enterprise Solutions for Hadoop and Big Data. It offers customers the most open, robust, and value-conscious portfolio of solutions, services, and tools for the shortest route to success with Big Data. The unique differentiator is the ability to architect and ...
The Internet of Things will challenge the status quo of how IT and development organizations operate. Or will it? Certainly the fog layer of IoT requires special insights about data ontology, security and transactional integrity. But the developmental challenges are the same: People, Process and Platform and how we integrate our thinking to solve complicated problems. In his session at 19th Cloud Expo, Craig Sproule, CEO of Metavine, will demonstrate how to move beyond today's coding paradigm ...
Fifty billion connected devices and still no winning protocols standards. HTTP, WebSockets, MQTT, and CoAP seem to be leading in the IoT protocol race at the moment but many more protocols are getting introduced on a regular basis. Each protocol has its pros and cons depending on the nature of the communications. Does there really need to be only one protocol to rule them all? Of course not. In his session at @ThingsExpo, Chris Matthieu, co-founder and CTO of Octoblu, walk you through how Oct...
A completely new computing platform is on the horizon. They’re called Microservers by some, ARM Servers by others, and sometimes even ARM-based Servers. No matter what you call them, Microservers will have a huge impact on the data center and on server computing in general. Although few people are familiar with Microservers today, their impact will be felt very soon. This is a new category of computing platform that is available today and is predicted to have triple-digit growth rates for some ...
Everyone knows that truly innovative companies learn as they go along, pushing boundaries in response to market changes and demands. What's more of a mystery is how to balance innovation on a fresh platform built from scratch with the legacy tech stack, product suite and customers that continue to serve as the business' foundation. In his General Session at 19th Cloud Expo, Michael Chambliss, Head of Engineering at ReadyTalk, will discuss why and how ReadyTalk diverted from healthy revenue an...
For basic one-to-one voice or video calling solutions, WebRTC has proven to be a very powerful technology. Although WebRTC’s core functionality is to provide secure, real-time p2p media streaming, leveraging native platform features and server-side components brings up new communication capabilities for web and native mobile applications, allowing for advanced multi-user use cases such as video broadcasting, conferencing, and media recording.
Established in 1998, Calsoft is a leading software product engineering Services Company specializing in Storage, Networking, Virtualization and Cloud business verticals. Calsoft provides End-to-End Product Development, Quality Assurance Sustenance, Solution Engineering and Professional Services expertise to assist customers in achieving their product development and business goals. The company's deep domain knowledge of Storage, Virtualization, Networking and Cloud verticals helps in delivering ...
SYS-CON Events announced today that Enzu will exhibit at the 19th International Cloud Expo, which will take place on November 1–3, 2016, at the Santa Clara Convention Center in Santa Clara, CA. Enzu’s mission is to be the leading provider of enterprise cloud solutions worldwide. Enzu enables online businesses to use its IT infrastructure to their competitive advantage. By offering a suite of proven hosting and management services, Enzu wants companies to focus on the core of their online busine...
In the next five to ten years, millions, if not billions of things will become smarter. This smartness goes beyond connected things in our homes like the fridge, thermostat and fancy lighting, and into heavily regulated industries including aerospace, pharmaceutical/medical devices and energy. “Smartness” will embed itself within individual products that are part of our daily lives. We will engage with smart products - learning from them, informing them, and communicating with them. Smart produc...
November 1–3, 2016, at the Santa Clara Convention Center in Santa Clara, CA. Penta Security is a leading vendor for data security solutions, including its encryption solution, D’Amo. By using FPE technology, D’Amo allows for the implementation of encryption technology to sensitive data fields without modification to schema in the database environment. With businesses having their data become increasingly more complicated in their mission-critical applications (such as ERP, CRM, HRM), continued ...
OnProcess Technology has announced it will be a featured speaker at @ThingsExpo, taking place November 1 - 3, 2016, in Santa Clara, California. Dan Gettens, OnProcess’ Chief Analytics Officer, will discuss how Internet of Things (IoT) data can be leveraged to predict product failures, improve uptime and slash costly inventory stock. @ThingsExpo is an annual gathering of IoT and cloud developers, practitioners and thought-leaders who exchange ideas and insights on topics ranging from Big Data in...