|By Hovhannes Avoyan||
|October 25, 2012 08:55 AM EDT||
There are a variety of ways to implement proxying capabilities for web servers. As Apache is the most popular web server, we will try to implement proxying on it. Everyone who knows Apache well, probably knows that Apache implements proxying capability for AJP13 , FTP, CONNECT , HTTP/1.x.
The choice of reverse proxy server is fully dependent on what is actually trying to be hidden behind it. Each proxy mechanism has its own benefits and bottlenecks. Only for Apache, there are several ways to hide application servers (mod_proxy, mod_passenger, mod_wsgi, mod_jk). While mod_passenger and mod_wsgi are good for ruby and python servers respectively, these are a little bit outside the proxying idea. In this article I would like to discuss mod_proxy and mod_jk.
Now let’s think about what we have and what we want to put under proxy. The most common case is to put a pool of Tomcat servers behind Apache. Tomcat servers by default listen to 8080 for HTTP and 8009 for AJP. Now, we want to have Apache listen to 80 for incoming HTTP requests and 443 for HTTPS. People who have configured Tomcat for SSL will undoubtedly agree with me that SSL on Tomcat is quite annoying, so it’s better to implement SSL on the Apache side rather than playing with Tomcat’s keystores.
Okay, now we have two Tomcat servers on 2 different servers with our application installed, and both are on 8080 and an 8009 HTTP/AJP respectively. And one Apache on a third which will do HTTP on 80 , HTTPS on 443 for us and process requests to downstream Tomcat servers.
Situation 1 with mod_proxy and mod_proxy_http:
OK, here’s what this means:
User opens http://www.yourdomain.com in their browser
- Request comes to Apache
- Apache proxies it via HTTP to downstream Tomcat to port 8080
- Tomcat sends response to Apache via HTTP
- Apache delivers content to User’s browser
Well, so what are the pros and cons of this situation? We will provide some comparison tables below, but in general:
- Easy and quick to configure
- Works for all downstream application servers
- We do not have sticky sessions: if a user logs in to Tomcat1 and sends another request it will most likely go to Tomcat2 and the user will get a session expired error.
- mod_proxy does not support failover detection, so it will continue to send requests to downstream Tomcat even if it is down.
- Some Java applications exhibit unpredictable behavior when they are under a proxy environment. (From my experience, Atlassian Bamboo and Fisheye server’s progress bars stalled on several pages, but this was corrected by moving to JK; I have heard about other strange problems as well. )
Now let’s see Situation 2, where we use JK for downstream servers:
A REAL LIFE EXAMPLE
At first sight we can see that nothing has been changed, but this is only at first sight. The main difference here is that now Apache is talking to the Tomcats via AJP 13 and not HTTP protocol. So the process of opening the web site is the following:
- User opens http://www.yourdomain.com in their browser
- Request comes to Apache
- Apache proxies it via AJP 13 to downstream Tomcat to the port 8009
- Tomcat sends response to Apache via AJP
- Apache receives AJP and delivers content to Users browser via HTTP
It seems there is a little overhead with jumping around on HTTP and AJP, but there are benefits as well. Let’s see the Good and Bad sides of JK balancing:
- After a little tweaking we can have sticky sessions just by adding sticky_session=True on Apache and jvmRoute=”NODENAME” on the Tomcat sides. After this, users who are logged in to Tomcat1 will never be dropped to Tomcat2 until Tomcat1 is alive. (Actually you can Use Membase or Memcached as session store so users will never lose their session until it expires normally)
- We have node failure detection, so if Tomcat1 fails, Apache will not send requests to it until it detects that it is back.
- JK configuration is much more advanced than that of mod_proxy and allows lots of tweaking, which will result in better performance and make the environment work just as you need it to.
- JK has a web admin tool that allows you to decommission, suspend and play with the LB factor in real time.
- So far I have found only one bad thing: it is a little harder to configure, so it required some administrator skills.
At this moment you may be asking “Why do I need this? I have a single Tomcat server and it’s working fine”. As a matter of fact, you need to build a network which can handle your current load, be scalable and which will not affect the normal behavior of your websites. From this point of view, the choice of reverse proxy solution is quite reasonable.
Here is a real life example of one of our client server architectures, which I think is a good one
In general, the process is as follows:
- User does DNS request, gets ip address of one of the Varnish servers and the Static content server/s (NGINX).
- NGINX delivers content directly.
- Varnish caches whatever needs to be cached and sends request downstream to one of the Apaches.
- Apache gets JSESSIONID and forwards request via JK to the required Tomcat server or does balance if user does not have cookie.
- Tomcat servers keep sessions in local RAM and copy in Membase cluster (so even if one Tomcat fails another can retrieve its session from Membase ). Membase is clustered memcache so it is fault tolerant by nature (we will have a closer look at Membase in another article).
- Tomcat does needed application logic, (retrieves information from Hadoop/HBase database, etc.) and responds to Apache.
- Apache sends response back to Varnish.
- Varnish updates cache if needed and does delivery to client.
This is a real live working scenario, and it proved itself to be fault tolerant and extremely fast.
I know that after reading this article a lot of people will ask, “why is Apache needed when Varnish can do session stickiness, etc. …”
But the idea here is to use the best possible software for each particular role, software which has real and approved redundancy and reasonable layers of architecture which can help us to easily and quickly detect problems and fix them as they appear. Also, if we keep in mind that the client uses not only HTTP, but also HTTPS, I did not see any webserver which worked with SSL as smoothly as Apache did. Even if we do not have SSL initially, we will have it soon, and I do not believe that any web project can go far without SSL.
Following is a little comparison of JK and mod_proxy, so you can see more closely what these tools are.
|Node failure detection||mod_proxy_balancer has to be present in the server||7||Advanced||10|
|Backend SSL||supported (mod_ssl required)||5||not supported||0|
|Session stickiness||not supported||0||Supported via JVM Route||10|
|Protocols||HTTP, HTTPS||10||AJP 13||8|
|Node decommissioning||Manual needs Apache reload||3||Online via web admin||10|
|Web admin interface||Not present||0||Advanced with RO and RW support||10|
|Large AJP packet sizes||8K||5||Larger than 8K||10|
|Compatibility with other app. servers||Works with all HTTP application servers||10||AJP Compatible (Tomcat, Glassfish, etc. …)||5|
|Configuration||Compatible with Apache Httpd configuration file||10||Need separate JK Workers file in .properties format||8|
So now let’s do some stress tests on both mod_jk and mod_proxy. The Installation schema is as described above (one load balancer, two application servers.) On both Apache server hosts, monitoring software from Monitis.com is installed which will check the servers’ health in real time.
We have used Amazon EC2 medium instances for this test. Here are the load test results in both graphical and plain text mode.
Monitoring is implemented using Monitis M3 monitors.
There are 2 monitors used:
apache_monitor – used for apache server’s health check.
http_load monitor - used to check the load time difference during Apache benchmarking.
The mentioned monitors provide useful information which helps to find relationships between various metrics.
The graphic below depicts Apache worker’s status while busy (upper line) and idle (lower line) while benchmarking using
This graph shows Apache busy and idle worker processes on the Apache web server, so we can see that of 150 enabled processes, almost all are busy during the stress test.
Http content load time (time connect, time transfer, time total)
Following is data provided by siege after benchmarking 7 times (using mod_proxy), each time increasing the concurrent users’ number by 100:
|Concurrent conns.||Trans||Elap Time||Data Trans||Resp Time||Trans Rate||Throughput||Concurrent||Failed|
The graphic below represents Apache worker’s busy (upper line) and idle (lower line) status while benchmarking using
This graph shows Apache busy and idle worker processes on the Apache webserver, so we can see that of 150 enabled processes, almost all are busy during the stress test.
Http content load time (time connect, time transfer, time total)
Following is data provided by siege after benchmarking 7 times (using mod_jk), each time increasing the concurrent users number by 100:
|Concurrent conns.||Trans||Elap time||Data Trans||Resp Time||Trans time||Throughput||Concurrent||Failed|
Both mentioned modules, mod_proxy and mod_jk, are used as balancers for backend application servers such as Tomcat and GlassFish. What are the most important features in load balancing? I assumed node failure detection at first, and ease of session stability and load balancing configuration, without requiring any other extra tools or packages. Do not forget about performance, as well.
So what do we have? The resulting tables show that when advanced load balancing or node failure detection is needed, mod_jk is preferable. However, it cannot provide flexibility such as mod_proxy does when configuring (mod_proxy configuration is as easy as Apache configuration and there is no need for separate files like workers.properties) nor for compatibility needs with servers, other than AJP compatibility.
Now a little bit about performance. While the concurrent users count is not so high (in our case: 400), both servers’ behavior is similar, and it seems mod_proxy is able to provide better performance, but things changed as the number of concurrent users grew.
Take a look at this table:
|Concurrent users||Failed requests(10 Seconds Timeout)|
As you see, with an almost equal number of connections, mod_proxy fails approximately 59% more often.
If you have a small project, or need to hide a variety of application servers (Tomcat+Rails+Django), and if you need an easily configurable and fast SSL solution and your server load is not heavy, then use mod_proxy.
But if your goal is to loadbalance Java applications servers, then JK is definitely the better solution.Share Now:
SYS-CON Events announced today that Hitrons Solutions will exhibit at the 19th International Cloud Expo, which will take place on November 1–3, 2016, at the Santa Clara Convention Center in Santa Clara, CA. Hitrons Solutions Inc. is distributor in the North American market for unique products and services of small and medium-size businesses, including cloud services and solutions, SEO marketing platforms, and mobile applications.
Oct. 24, 2016 11:00 PM EDT Reads: 1,993
Successful digital transformation requires new organizational competencies and capabilities. Research tells us that the biggest impediment to successful transformation is human; consequently, the biggest enabler is a properly skilled and empowered workforce. In the digital age, new individual and collective competencies are required. In his session at 19th Cloud Expo, Bob Newhouse, CEO and founder of Agilitiv, will draw together recent research and lessons learned from emerging and established ...
Oct. 24, 2016 09:45 PM EDT Reads: 1,360
November 1–3, 2016, at the Santa Clara Convention Center in Santa Clara, CA. Penta Security is a leading vendor for data security solutions, including its encryption solution, D’Amo. By using FPE technology, D’Amo allows for the implementation of encryption technology to sensitive data fields without modification to schema in the database environment. With businesses having their data become increasingly more complicated in their mission-critical applications (such as ERP, CRM, HRM), continued ...
Oct. 24, 2016 08:45 PM EDT Reads: 1,026
SYS-CON Events announced today that Enzu will exhibit at the 19th International Cloud Expo, which will take place on November 1–3, 2016, at the Santa Clara Convention Center in Santa Clara, CA. Enzu’s mission is to be the leading provider of enterprise cloud solutions worldwide. Enzu enables online businesses to use its IT infrastructure to their competitive advantage. By offering a suite of proven hosting and management services, Enzu wants companies to focus on the core of their online busine...
Oct. 24, 2016 08:30 PM EDT Reads: 1,325
For basic one-to-one voice or video calling solutions, WebRTC has proven to be a very powerful technology. Although WebRTC’s core functionality is to provide secure, real-time p2p media streaming, leveraging native platform features and server-side components brings up new communication capabilities for web and native mobile applications, allowing for advanced multi-user use cases such as video broadcasting, conferencing, and media recording.
Oct. 24, 2016 07:30 PM EDT Reads: 3,204
SYS-CON Events announced today that Cloudbric, a leading website security provider, will exhibit at the 19th International Cloud Expo, which will take place on November 1–3, 2016, at the Santa Clara Convention Center in Santa Clara, CA. Cloudbric is an elite full service website protection solution specifically designed for IT novices, entrepreneurs, and small and medium businesses. First launched in 2015, Cloudbric is based on the enterprise level Web Application Firewall by Penta Security Sys...
Oct. 24, 2016 07:15 PM EDT Reads: 1,166
Established in 1998, Calsoft is a leading software product engineering Services Company specializing in Storage, Networking, Virtualization and Cloud business verticals. Calsoft provides End-to-End Product Development, Quality Assurance Sustenance, Solution Engineering and Professional Services expertise to assist customers in achieving their product development and business goals. The company's deep domain knowledge of Storage, Virtualization, Networking and Cloud verticals helps in delivering ...
Oct. 24, 2016 07:15 PM EDT Reads: 1,049
The best way to leverage your Cloud Expo presence as a sponsor and exhibitor is to plan your news announcements around our events. The press covering Cloud Expo and @ThingsExpo will have access to these releases and will amplify your news announcements. More than two dozen Cloud companies either set deals at our shows or have announced their mergers and acquisitions at Cloud Expo. Product announcements during our show provide your company with the most reach through our targeted audiences.
Oct. 24, 2016 06:15 PM EDT Reads: 4,757
In the next five to ten years, millions, if not billions of things will become smarter. This smartness goes beyond connected things in our homes like the fridge, thermostat and fancy lighting, and into heavily regulated industries including aerospace, pharmaceutical/medical devices and energy. “Smartness” will embed itself within individual products that are part of our daily lives. We will engage with smart products - learning from them, informing them, and communicating with them. Smart produc...
Oct. 24, 2016 05:45 PM EDT Reads: 1,501
SYS-CON Events announced today that 910Telecom will exhibit at the 19th International Cloud Expo, which will take place on November 1–3, 2016, at the Santa Clara Convention Center in Santa Clara, CA. Housed in the classic Denver Gas & Electric Building, 910 15th St., 910Telecom is a carrier-neutral telecom hotel located in the heart of Denver. Adjacent to CenturyLink, AT&T, and Denver Main, 910Telecom offers connectivity to all major carriers, Internet service providers, Internet backbones and ...
Oct. 24, 2016 05:00 PM EDT Reads: 3,636
SYS-CON Events announced today that Coalfire will exhibit at the 19th International Cloud Expo, which will take place on November 1–3, 2016, at the Santa Clara Convention Center in Santa Clara, CA. Coalfire is the trusted leader in cybersecurity risk management and compliance services. Coalfire integrates advisory and technical assessments and recommendations to the corporate directors, executives, boards, and IT organizations for global brands and organizations in the technology, cloud, health...
Oct. 24, 2016 04:45 PM EDT Reads: 1,570
SYS-CON Events announced today that Transparent Cloud Computing (T-Cloud) Consortium will exhibit at the 19th International Cloud Expo®, which will take place on November 1–3, 2016, at the Santa Clara Convention Center in Santa Clara, CA. The Transparent Cloud Computing Consortium (T-Cloud Consortium) will conduct research activities into changes in the computing model as a result of collaboration between "device" and "cloud" and the creation of new value and markets through organic data proces...
Oct. 24, 2016 04:30 PM EDT Reads: 1,368
The Internet of Things (IoT), in all its myriad manifestations, has great potential. Much of that potential comes from the evolving data management and analytic (DMA) technologies and processes that allow us to gain insight from all of the IoT data that can be generated and gathered. This potential may never be met as those data sets are tied to specific industry verticals and single markets, with no clear way to use IoT data and sensor analytics to fulfill the hype being given the IoT today.
Oct. 24, 2016 04:15 PM EDT Reads: 2,608
WebRTC defines no default signaling protocol, causing fragmentation between WebRTC silos. SIP and XMPP provide possibilities, but come with considerable complexity and are not designed for use in a web environment. In his session at @ThingsExpo, Matthew Hodgson, technical co-founder of the Matrix.org, discussed how Matrix is a new non-profit Open Source Project that defines both a new HTTP-based standard for VoIP & IM signaling and provides reference implementations.
Oct. 24, 2016 04:15 PM EDT Reads: 2,787
In his general session at 18th Cloud Expo, Lee Atchison, Principal Cloud Architect and Advocate at New Relic, discussed cloud as a ‘better data center’ and how it adds new capacity (faster) and improves application availability (redundancy). The cloud is a ‘Dynamic Tool for Dynamic Apps’ and resource allocation is an integral part of your application architecture, so use only the resources you need and allocate /de-allocate resources on the fly.
Oct. 24, 2016 04:00 PM EDT Reads: 3,728
We're entering the post-smartphone era, where wearable gadgets from watches and fitness bands to glasses and health aids will power the next technological revolution. With mass adoption of wearable devices comes a new data ecosystem that must be protected. Wearables open new pathways that facilitate the tracking, sharing and storing of consumers’ personal health, location and daily activity data. Consumers have some idea of the data these devices capture, but most don’t realize how revealing and...
Oct. 24, 2016 02:15 PM EDT Reads: 3,965
A completely new computing platform is on the horizon. They’re called Microservers by some, ARM Servers by others, and sometimes even ARM-based Servers. No matter what you call them, Microservers will have a huge impact on the data center and on server computing in general. Although few people are familiar with Microservers today, their impact will be felt very soon. This is a new category of computing platform that is available today and is predicted to have triple-digit growth rates for some ...
Oct. 24, 2016 02:00 PM EDT Reads: 34,160
SYS-CON Events announced today that MathFreeOn will exhibit at the 19th International Cloud Expo, which will take place on November 1–3, 2016, at the Santa Clara Convention Center in Santa Clara, CA. MathFreeOn is Software as a Service (SaaS) used in Engineering and Math education. Write scripts and solve math problems online. MathFreeOn provides online courses for beginners or amateurs who have difficulties in writing scripts. In accordance with various mathematical topics, there are more tha...
Oct. 24, 2016 01:00 PM EDT Reads: 1,015
In past @ThingsExpo presentations, Joseph di Paolantonio has explored how various Internet of Things (IoT) and data management and analytics (DMA) solution spaces will come together as sensor analytics ecosystems. This year, in his session at @ThingsExpo, Joseph di Paolantonio from DataArchon, will be adding the numerous Transportation areas, from autonomous vehicles to “Uber for containers.” While IoT data in any one area of Transportation will have a huge impact in that area, combining sensor...
Oct. 24, 2016 01:00 PM EDT Reads: 868
SYS-CON Events announced today that SoftNet Solutions will exhibit at the 19th International Cloud Expo, which will take place on November 1–3, 2016, at the Santa Clara Convention Center in Santa Clara, CA. SoftNet Solutions specializes in Enterprise Solutions for Hadoop and Big Data. It offers customers the most open, robust, and value-conscious portfolio of solutions, services, and tools for the shortest route to success with Big Data. The unique differentiator is the ability to architect and ...
Oct. 24, 2016 01:00 PM EDT Reads: 914