Welcome!

Apache Authors: Pat Romanski, Liz McMillan, Elizabeth White, Christopher Harrold, Janakiram MSV

Related Topics: @CloudExpo, Apache, @DevOpsSummit

@CloudExpo: Blog Feed Post

Solr Redis Plugin Use Cases By @Sematext | @DevOpsSummit [#DevOps]

The Solr Redis Plugin is an extension for Solr that provides a query parser that uses data stored in Redis

Solr Redis Plugin Use Cases and Performance Tests

The Solr Redis Plugin is an extension for Solr that provides a query parser that uses data stored in Redis. It is open-sourced on Github by Sematext. This tool is basically a QParserPlugin that establishes a connection to Redis and takes data stored in SET, ZRANGE and other Redis data structures in order to build a query. Data fetched from Redis is used in RedisQParser and is responsible for building a query. Moreover, this plugin provides a highlighter extension which can be used to highlight parts of aliased Solr Redis queries (this will be described in a future).

Use Case: Social Network
Imagine you have a social network and you want to implement a search solution that can search things like: events, interests, photos, and all your friends' events, interests, and photos. A naive, Solr-only-based implementation would search over all documents and filter by a "friends" field. This requires denormalization and indexing the full list of friends into each document that belongs to a user. Building a query like this is just searching over documents and adding something like a "friends:1234″ clause to the query. It seems simple to implement, but the reality is that this is a terrible solution when you need to update a list of friends because it requires a modification of each document. So when the number of documents (e.g., photos, events, interests, friends and their items) connected with a user grows, the number of potential updates rises dramatically and each modification of connections between users becomes a nightmare. Imagine a person with 10 photos and 100 friends (all of which have their photos, events, interests, etc.).  When this person gets the 101th friend, the naive system with flattened data would have to update a lot of documents/rows.  As we all know, in a social network connections between people are constantly being created and removed, so such a naive Solr-only system could not really scale.

Social networks also have one very important attribute: the number of connections of a single user is typically not expressed in millions. That number is typically relatively small - tens, hundreds, sometimes thousands. This begs the question: why not carry information about user connections in each query sent to a search engine? That way, instead of sending queries with clause "friends:1234," we can simply send queries with multiple user IDs connected by an "OR" operator. When a query has all the information needed to search entities that belong to a user's friends, there is no need to store a list of friends in each user's document. Storing user connections in each query leads to sending of rather large queries to a search engine; each of them containing multiple terms containing user ID (e.g., id:5 OR id:10 OR id:100 OR ...) connected by a disjunction operator. When the number of terms grows the query requests become very big. And that's a problem, because preparing it and sending it to a search engine over the network becomes awkward and slow.

How Does It Work?
The image below presents how Solr Redis Plugin works.

  1. The client application sends simple and very small query to Solr. That query contains only a simple fragment which calls Solr Redis Plugin. You can read more about it in the "How to use the plugin?" section.
  2. Solr Redis Plugin takes a connection from a connection pool and sends a command to Redis.
  3. Redis sends a response. The response format depends on the Redis command in the request. Redis can return just a set of records, or a set of records with scores. When the Solr Redis Parser receives a Redis response it takes the records and builds a Lucene boolean query. By default the OR operator is used, but it can be changed to the AND operator.
  4. Solr Redis Plugin returns the Lucene query to Solr, which executes it and sends matches back to the client application.

Solr Redis Plugin Data Flow
Of course, we can achieve similar functionality by making the client application responsible for communication with Redis. This solution moves the entire responsibility for establishing and handling connections with Redis to the client application.

  1. The client application sends a Redis command.
  2. The client application receives a Redis response, parses it, and prepares a Solr Query.
  3. Very big query is sent to Solr.
  4. Solr parses big query, searches for results and sends a response to the client application.

Data Flow without Solr Redis Plugin
As we can see, Solr Redis Plugin eliminates a lot of work that client application doesn't have to be aware of:

Solr Redis Plugin Client Application approach
Establishing connections to Redis Connection handled by client app
Keeping connection pool which
accelerates communication
No connection pool by default
It has to be created by a client app
Small Solr queries Large Solr queries

Project Location

The project is available on Github. You can clone the repository at https://github.com/sematext/solr-redis.git. Patches are welcome. There is also a package in the central maven repository com.sematext:solr-redis.

How to Use the Plugin?

Configuration
Solr Redis Plugin
is simply a QParserPlugin that is very easy to deploy in a Solr instance. The plugin classes are packaged into a JAR file. You can also download a pre-built package from Maven (that JAR file has to be moved to Solr classpath, of course). For example, you can simply copy the solr-redis.jar file to $SOLR_HOME/lib directory.  To use it, add the following snippet to solrconfig.xml and restart Solr:

<queryParser name="redis" class="com.sematext.solr.redis.RedisQParserPlugin">
<str name="host">localhost</str>
<str name="maxConnections">30</str>
<str name="retries">2</str>
</queryParser>

Querying
Solr Redis is a QParserPlugin, so the syntax is similar to other QParserPlugins such as frange, term or boost. To use the plugin in a query you need to run a query such as:

{!redis command=smembers key=KEY}FIELD

The constructions presented above can be used in both q and fq Solr parameters. The example of the whole request with filter query is presented below.

http://localhost:8983/solr/collection/select?q=*:*&fq={!redis command=smembers key=KEY}FIELD

KEY - is a Redis key used to look up values.

FIELD - is a name of a field which will be queried. Field has to exist in index schema. You can use any of field types: text fields, string fields or numeric fields.

In most cases you can use Solr Redis Plugin in both q and fq. Often, it is better to use a plugin as a filter query. Using the plugin as a filter lets Solr cache the filter using FilterCache, which will speed up all user searches.

Let's go back to the Social Network example for a moment. User X may want to search not only documents related to his friends, but also documents related to friends of friends. Perhaps documents from a group of friends should be scored more highly than documents from other friends. To do this we can use scoring from Redis sorted set. Each record in the sorted set can have a different scoring value. The query to Solr that uses sorted set scoring is shown below. Please note that in this example the Solr Redis Plugin is used in q. That's because Solr doesn't calculate scoring of filter query. Here we cannot use fq because we need Solr to do the scoring using scores from sorted set.

http://localhost:8983/solr/collection/select?q={!redis command=zrevrangebyscore key=KEY min=100 max=1000}FIELD

Logging
The Solr Redis Plugin logs a few messages which are helpful to understand when Redis is queried, what data is fetched from Redis, which Redis command was used, or when an error occurred. You can easily manage logging level using Solr Admin UI.

Performance Tests
At Sematext we care a lot about performance; in fact, we built SPM, a comprehensive performance monitoring, alerting and anomaly detection solution.   Not only does it monitor Solr (here's a demo of Solr being monitored), but many other types of applications as well.  We also frequently tune Solr performance for our clients in our Solr Consulting engagements.  So, of course, we ran a performance test of Solr Redis Plugin!

Dataset
In the first test, 1,000,000 simple documents were indexed in Solr. Each document represents a user with an ID from 1 to 1,000,000. The second thing was to generate connections between users, done randomly with a simple python script. Each user had between 1 and 1,000 connections (avg. was 500). Once we prepared the data it was time to run performance tests. Tests were performed to compare results of Solr Redis Plugin to an approach with simple filter query with multiple "OR" clauses. We generated a query set. All tests were performed with Apache JMeter using 5 threads.  JMeter was run from the same machine where Solr was deployed. It is not ideal test environment because it won't show the biggest advantage of the plugin - sending very big queries over the network, but we also wanted to show that even on the same machine using the Solr Redis Plugin is more efficient than using big filter queries.

Tests run on Intel Core2 Duo [email protected], 8GB RAM. Solr used Oracle JDK 1.7 u51 with default GC settings. Because of default GC settings we can see periodical peaks of latency on diagrams.

Results

Test 1
Results in the table and images below present performance tests for both the Solr Redis Plugin and a filter query with multiple terms. The average number of connections between users was 500. We can see that using Solr Redis Plugin is a bit faster than sending big queries to Solr. It is very important that queries are already prepared so measured latency doesn't include the time needed to generate a query. In a real-world scenario queries should also be generated by application which also take time (getting data from Redis and constructing a query string before sending it to Solr).


Samples Avg. time[ms] Median time[ms] 90-th percent. time[ms] Requests per second
SolrRedisPlugin 30000 60 52 111 78.3
Big filter query 30000 71 60 141 67.5

SolrRedisPlugin - 30000 queries - avg. 500 connections between users Filter query - 30000 queries - avg. 500 connections between users (avg 500 boolean clauses)

Test 2
Below we can see results of the test which was very similar to the previous one except the average number of connections between users was 5,000.


Samples Avg. time[ms] Median time[ms] 90-th percent. time[ms] Requests per second
SolrRedisPlugin 2000 587 522 1047 8.46
Big filter query 2000 766 649 1349 6.47

SolrRedisPlugin - 2000 queries - avg. 5000 connections between users Filter query - 2000 queries - avg. 5000 connections between users (avg 5000 boolean clauses)

You have to remember that HTTP URL size is limited, and unless you change the setting in the container or you use POST requests instead of GET you would be unable to run queries with hundreds or thousands clauses in a filter.

Summary
This post describes the Solr Redis Plugin. We showed an example of usage of the plugin for handling queries in a social network. Using such a plugin is much more convenient, efficient and scalable than generating filter queries with hundreds or thousands of terms. Performance tests show that it is more efficient to use the plugin instead of sending large queries to Solr over the network. We should not forget tests were performed in an environment where JMeter was sending queries from the same machine where Solr was running. This means our tests involved almost no network traffic, which means the results for a pure Solr approach without Solr Redis Plugin would be even worse than queries been going over the network to a remote Solr instance.

More Stories By Sematext Blog

Sematext is a globally distributed organization that builds innovative Cloud and On Premises solutions for performance monitoring, alerting and anomaly detection (SPM), log management and analytics (Logsene), and search analytics (SSA). We also provide Search and Big Data consulting services and offer 24/7 production support for Solr and Elasticsearch.

@ThingsExpo Stories
Nordstrom is transforming the way that they do business and the cloud is the key to enabling speed and hyper personalized customer experiences. In his session at 21st Cloud Expo, Ken Schow, VP of Engineering at Nordstrom, discussed some of the key learnings and common pitfalls of large enterprises moving to the cloud. This includes strategies around choosing a cloud provider(s), architecture, and lessons learned. In addition, he covered some of the best practices for structured team migration an...
Recently, REAN Cloud built a digital concierge for a North Carolina hospital that had observed that most patient call button questions were repetitive. In addition, the paper-based process used to measure patient health metrics was laborious, not in real-time and sometimes error-prone. In their session at 21st Cloud Expo, Sean Finnerty, Executive Director, Practice Lead, Health Care & Life Science at REAN Cloud, and Dr. S.P.T. Krishnan, Principal Architect at REAN Cloud, discussed how they built...
In his session at 21st Cloud Expo, Raju Shreewastava, founder of Big Data Trunk, provided a fun and simple way to introduce Machine Leaning to anyone and everyone. He solved a machine learning problem and demonstrated an easy way to be able to do machine learning without even coding. Raju Shreewastava is the founder of Big Data Trunk (www.BigDataTrunk.com), a Big Data Training and consulting firm with offices in the United States. He previously led the data warehouse/business intelligence and B...
In his Opening Keynote at 21st Cloud Expo, John Considine, General Manager of IBM Cloud Infrastructure, led attendees through the exciting evolution of the cloud. He looked at this major disruption from the perspective of technology, business models, and what this means for enterprises of all sizes. John Considine is General Manager of Cloud Infrastructure Services at IBM. In that role he is responsible for leading IBM’s public cloud infrastructure including strategy, development, and offering m...
With tough new regulations coming to Europe on data privacy in May 2018, Calligo will explain why in reality the effect is global and transforms how you consider critical data. EU GDPR fundamentally rewrites the rules for cloud, Big Data and IoT. In his session at 21st Cloud Expo, Adam Ryan, Vice President and General Manager EMEA at Calligo, examined the regulations and provided insight on how it affects technology, challenges the established rules and will usher in new levels of diligence arou...
The 22nd International Cloud Expo | 1st DXWorld Expo has announced that its Call for Papers is open. Cloud Expo | DXWorld Expo, to be held June 5-7, 2018, at the Javits Center in New York, NY, brings together Cloud Computing, Digital Transformation, Big Data, Internet of Things, DevOps, Machine Learning and WebRTC to one location. With cloud computing driving a higher percentage of enterprise IT budgets every year, it becomes increasingly important to plant your flag in this fast-expanding busin...
Smart cities have the potential to change our lives at so many levels for citizens: less pollution, reduced parking obstacles, better health, education and more energy savings. Real-time data streaming and the Internet of Things (IoT) possess the power to turn this vision into a reality. However, most organizations today are building their data infrastructure to focus solely on addressing immediate business needs vs. a platform capable of quickly adapting emerging technologies to address future ...
No hype cycles or predictions of a gazillion things here. IoT is here. You get it. You know your business and have great ideas for a business transformation strategy. What comes next? Time to make it happen. In his session at @ThingsExpo, Jay Mason, an Associate Partner of Analytics, IoT & Cybersecurity at M&S Consulting, presented a step-by-step plan to develop your technology implementation strategy. He also discussed the evaluation of communication standards and IoT messaging protocols, data...
22nd International Cloud Expo, taking place June 5-7, 2018, at the Javits Center in New York City, NY, and co-located with the 1st DXWorld Expo will feature technical sessions from a rock star conference faculty and the leading industry players in the world. Cloud computing is now being embraced by a majority of enterprises of all sizes. Yesterday's debate about public vs. private has transformed into the reality of hybrid cloud: a recent survey shows that 74% of enterprises have a hybrid cloud ...
22nd International Cloud Expo, taking place June 5-7, 2018, at the Javits Center in New York City, NY, and co-located with the 1st DXWorld Expo will feature technical sessions from a rock star conference faculty and the leading industry players in the world. Cloud computing is now being embraced by a majority of enterprises of all sizes. Yesterday's debate about public vs. private has transformed into the reality of hybrid cloud: a recent survey shows that 74% of enterprises have a hybrid cloud ...
DevOps at Cloud Expo – being held June 5-7, 2018, at the Javits Center in New York, NY – announces that its Call for Papers is open. Born out of proven success in agile development, cloud computing, and process automation, DevOps is a macro trend you cannot afford to miss. From showcase success stories from early adopters and web-scale businesses, DevOps is expanding to organizations of all sizes, including the world's largest enterprises – and delivering real results. Among the proven benefits,...
@DevOpsSummit at Cloud Expo, taking place June 5-7, 2018, at the Javits Center in New York City, NY, is co-located with 22nd Cloud Expo | 1st DXWorld Expo and will feature technical sessions from a rock star conference faculty and the leading industry players in the world. The widespread success of cloud computing is driving the DevOps revolution in enterprise IT. Now as never before, development teams must communicate and collaborate in a dynamic, 24/7/365 environment. There is no time to wait...
Cloud Expo | DXWorld Expo have announced the conference tracks for Cloud Expo 2018. Cloud Expo will be held June 5-7, 2018, at the Javits Center in New York City, and November 6-8, 2018, at the Santa Clara Convention Center, Santa Clara, CA. Digital Transformation (DX) is a major focus with the introduction of DX Expo within the program. Successful transformation requires a laser focus on being data-driven and on using all the tools available that enable transformation if they plan to survive ov...
SYS-CON Events announced today that T-Mobile exhibited at SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. As America's Un-carrier, T-Mobile US, Inc., is redefining the way consumers and businesses buy wireless services through leading product and service innovation. The Company's advanced nationwide 4G LTE network delivers outstanding wireless experiences to 67.4 million customers who are unwilling to compromise on qua...
SYS-CON Events announced today that Cedexis will exhibit at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 - Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Cedexis is the leader in data-driven enterprise global traffic management. Whether optimizing traffic through datacenters, clouds, CDNs, or any combination, Cedexis solutions drive quality and cost-effectiveness. For more information, please visit https://www.cedexis.com.
SYS-CON Events announced today that Google Cloud has been named “Keynote Sponsor” of SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Companies come to Google Cloud to transform their businesses. Google Cloud’s comprehensive portfolio – from infrastructure to apps to devices – helps enterprises innovate faster, scale smarter, stay secure, and do more with data than ever before.
SYS-CON Events announced today that Vivint to exhibit at SYS-CON's 21st Cloud Expo, which will take place on October 31 through November 2nd 2017 at the Santa Clara Convention Center in Santa Clara, California. As a leading smart home technology provider, Vivint offers home security, energy management, home automation, local cloud storage, and high-speed Internet solutions to more than one million customers throughout the United States and Canada. The end result is a smart home solution that sav...
SYS-CON Events announced today that Opsani will exhibit at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Opsani is the leading provider of deployment automation systems for running and scaling traditional enterprise applications on container infrastructure.
SYS-CON Events announced today that Nirmata will exhibit at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Nirmata provides a comprehensive platform, for deploying, operating, and optimizing containerized applications across clouds, powered by Kubernetes. Nirmata empowers enterprise DevOps teams by fully automating the complex operations and management of application containers and its underlying ...
SYS-CON Events announced today that Opsani to exhibit at SYS-CON's 21st Cloud Expo, which will take place on October 31 through November 2nd 2017 at the Santa Clara Convention Center in Santa Clara, California. Opsani is creating the next generation of automated continuous deployment tools designed specifically for containers. How is continuous deployment different from continuous integration and continuous delivery? CI/CD tools provide build and test. Continuous Deployment is the means by which...