Click here to close now.

Welcome!

Apache Authors: Carmen Gonzalez, Ruxit Blog, Roger Strukhoff, Elizabeth White, Pat Romanski

Related Topics: XML

XML: Article

Integrating Enterprise Information on Demand with XQuery - Part 2

Integrating Enterprise Information on Demand with XQuery - Part 2

In Part I of this article (XML-J, Vol. 4, issue 6), we introduced the enterprise information integration (EII) problem and explained how the XML query language XQuery and related technologies - specifically XML, XML Schema, and Web services - are central to enabling this age-old problem to be successfully addressed at last.

We provided a technical overview of the XQuery language and presented a simple "single view of Customer" example to illustrate XQuery's role in the EII domain. The example was based on an electronics retailer that wanted to share customer information across three portals - portals for customer self-service, credit approval, and product service. The information to be integrated resided in a variety of back-end information sources, including two relational database management systems, an SAP system, and a Web service.

In this article, our XQuery/EII saga continues. In this installment, we look at how EII relates to two other technologies designed for integration tasks, namely enterprise application integration (EAI) and extract-transform-load (ETL) tools. We also take a brief look at BEA Liquid Data for WebLogic, an XQuery-based EII offering, and discuss how XQuery and Liquid Data were put to use recently in a telecommunications-related customer project.

What About EAI?
Given the industry buzz around EAI today, a natural question about EII is "so why bother?" That is, why isn't a modern EAI solution alone - for example, a workflow engine with XML-based data transformation capabilities - sufficient to solve the EII problem? The answer is, in principle, that EAI is in fact sufficient to solve the EII problem. A developer could always choose to hand-build a set of workflows, writing one workflow per application-level "query" to deliver the desired information back to the calling applications. In the example from Part I of this article, three hand-tailored workflows could instead be written to provide information retrieval capabilities comparable to our XQuery-based solution. But is that the best approach, in terms of development time and maintenance cost?

The basic question here is when to use a declarative query language (XQuery in the case of modern EII) versus constructing code in a procedural language (a workflow language in the case of EAI). The lessons from the relational database revolution are clear: When applicable, a declarative approach offers significant advantages. Instead of hand-constructing a "query plan" (EAI workflow) to extract the needed data from each of the data sources in some manually predefined order, the EII approach allows a single, smaller, and simpler declarative query to be written.

The resulting benefits should be obvious. First, the user does not need to build each query plan by hand, which could involve a considerable effort. Instead, the user specifies (when defining the core view) what data sources are relevant and what logical conditions relate and characterize the data to be retrieved. Second, queries can be optimized automatically by the EII middleware, resulting in an optimal query execution plan (order of accessing the sources, queries or methods to extract the data, etc.) for each different query. For example, using EAI, one central workflow could be written to retrieve all of the customer information in Part I's example, and then other workflows could be written to first call this workflow and then further filter the results. However, in the EII approach, the query processor will (for each query) prune out irrelevant data sources as well as push SQL selection conditions (such as only retrieving "Open" support cases in Listing 2 of Part I) down to any RDBMS data sources. Third, as the data sources change over time in terms of their schemas, statistics, or performance, the EII user will not be forced to rewrite all of his or her queries. Simply maintaining each base view query and re-optimizing the other queries will adapt their query execution plans to the new situation. In contrast, in the case of EAI, many workflows would have to be rewritten to handle most such changes.

There really isn't an either/or choice to be made between EAI and EII at all. Both technologies have critical roles to play in an overall enterprise integration solution. These technologies are complementary: EII provides ease of data integration, while EAI provides ease of process integration. EII is appropriate for composing integrated views and queries over enterprise data. EAI is the appropriate technology for creating composite applications that orchestrate the functional capabilities of a set of related but independent applications, Web services, etc. Moreover, EII can be used to handily augment EAI in scenarios where workflows need to access integrated data views. For example, if our electronics retailer wanted its order process to offer free shipping to customers who have ordered more than $1,000 of goods during the year and who have accumulated more than 5,000 reward points, the integrated view of customer from Part I could be used to easily access the relevant information from within the order entry workflow.

What About ETL?
Another technology related to EII is ETL. In fact, ETL tools are designed precisely for the purpose of integrating data from multiple sources. These tools are therefore another category of software that naturally leads to a "why bother with EII?" question - why isn't ETL technology the answer? As you'll see, the answer is again that both technologies have their place in modern IT architectures.

ETL tools are designed for use in moving data from a variety of sources into a data warehouse for offline analysis and reporting purposes. As the name suggests, ETL tools provide facilities for extracting data from a source; transforming that data into a more suitable form for inclusion in the data warehouse, possibly cleansing it in the process; and then loading the transformed data into the warehouse's database. Typical ETL tools are therefore focused on supporting the design and administration of data migration, cleansing, and transformation processes. These are often batch processes that occur on a daily or weekly basis.

Data warehouses and the ETL tools that feed them are invaluable for enabling businesses to aggregate and analyze historical information. For example, our electronics retailer might very well want to keep track of customer data, sales data, and product issue data over a period of years in order to analyze customer behavior by geographic region over time, improve their credit card risk model, and so on. A data warehouse is the appropriate place to retain such data and run large analytical queries against it, and ETL technology is the right technology today for creating, cleaning, and maintaining the data in the warehouse. However, ETL is not the right technology for building applications that need access to current operational data - it doesn't support the declarative creation of views or real-time access to operational data through queries.

For applications that need to integrate current information, Part I of this article showed how XQuery can be used to declaratively specify reusable views that aggregate data from multiple operational stores and how XQuery can be used to write XML queries over such integrated views. We also explained how standard database query processing techniques, including view expansion, predicate pushdown, and distributed query optimization, can be applied to XQuery, making XQuery-based EII an excellent technological fit for such applications.

Clearly, both ETL and EII technologies have important roles to play in today's enterprise. ETL serves to feed data warehouses, while EII is an enabler for applications that need timely access to current, integrated information from a variety of operational enterprise data sources. As with EAI, there are also cases where the two technologies come together. As one example, an ETL tool could be used to help create and maintain a cross-reference table to relate different notions of "customer id" for use in creating XQuery-based EII views across different back-end systems. As another example, an ETL-fed data warehouse could be used to build a portal for analyzing the historical behavior of a company's top customers, with an EII tool used to allow click-through inspection of the customers' purchases in the past 24 hours.

Putting XQuery-Based EII to Work
For the reasons discussed in this article, XQuery-based EII middleware is an emerging product segment that promises to deliver the tools and technology needed in this important space. One commercially available XQuery-based middleware product is BEA Liquid Data for WebLogic. Liquid Data is capable of accessing data from relational database management systems, Web services, packaged applications (through J2EE CA adapters and application views), XML files, XML messages, and, through a custom function mechanism, most any other data source as well. For illustration purposes, the architecture of Liquid Data is depicted in Figure 1. Liquid Data provides default XML views of all of its data sources and provides an XQuery-based graphical view and query editor for use in integrating and enhancing information drawn from one or more data sources. It includes a distributed query processing engine as well as providing advanced features such as support for query result caching and both data-source-level and stored query-level access control.

As a final example of the applicability of XQuery to enterprise information integration problems, we'll describe an actual customer integration exercise where Liquid Data was put to use. In that project, a large telecommunications vendor wanted to create a single view of order information for one of its business divisions. The goal of the project was to make integrated order information available to the division's customers (other businesses) through a Web portal, enabling their customers to log in and check on the status of their orders, as well as making information available to the division's own customer service representatives.

The division had data distributed across multiple systems, including a relational database containing order summary information and two different order management systems. Order details were kept in one or the other of the two order management systems, depending on the type of order. Functionality-wise, a limited view of order details was provided through the customer order status portal that the division built using Liquid Data, whereas customer service representatives were permitted to see all of the order data through their portal. In both cases, it was possible to search for order information by various combinations of purchase order number, date range, and order.

The use of XQuery-based EII technology enabled the customer to complete their portal project in much less time than they had expected it to take with traditional technologies, and their total cost of ownership was also lower due to the reusability of Liquid Data assets and the low cost of maintenance enabled by EII.

Summary
In this article, we have explained how XQuery is beginning to transform the integration world, making it possible to finally tackle the enterprise information integration problem where past attempts have failed. In Part I we provided an overview of XQuery and illustrated how it could be used to integrate the disparate information sources of a hypothetical electronics retailer. In Part II we discussed the relationship of EII to EAI and ETL technologies and then briefly presented BEA's XQuery-based EII product and described one of the customer projects in which it was used.

Comments (0)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.


@ThingsExpo Stories
The Open Compute Project is a collective effort by Facebook and a number of players in the datacenter industry to bring lessons learned from the social media giant's giant IT deployment to the rest of the world. Datacenters account for 3% of global electricity consumption – about the same as all of Switzerland or the Czech Republic -- according to people I met at the recent Open Compute Summit in San Jose. With increasing mobility at the edge of the cloud and vast new dataflows being predicted with the growth of the Internet of Things (and The Coming Age of Many Zettabytes) in the near...
GENBAND has announced that SageNet is leveraging the Nuvia platform to deliver Unified Communications as a Service (UCaaS) to its large base of retail and enterprise customers. Nuvia’s cloud-based solution provides SageNet’s customers with a full suite of business communications and collaboration tools. Two large national SageNet retail customers have recently signed up to deploy the Nuvia platform and the company will continue to sell the service to new and existing customers. Nuvia’s capabilities include HD voice, video, multimedia messaging, mobility, conferencing, Web collaboration, deskt...
Wearable technology was dominant at this year’s International Consumer Electronics Show (CES) , and MWC was no exception to this trend. New versions of favorites, such as the Samsung Gear (three new products were released: the Gear 2, the Gear 2 Neo and the Gear Fit), shared the limelight with new wearables like Pebble Time Steel (the new premium version of the company’s previously released smartwatch) and the LG Watch Urbane. The most dramatic difference at MWC was an emphasis on presenting wearables as fashion accessories and moving away from the original clunky technology associated with t...
The WebRTC Summit 2014 New York, to be held June 9-11, 2015, at the Javits Center in New York, NY, announces that its Call for Papers is open. Topics include all aspects of improving IT delivery by eliminating waste through automated business models leveraging cloud technologies. WebRTC Summit is co-located with 16th International Cloud Expo, @ThingsExpo, Big Data Expo, and DevOps Summit.
SYS-CON Events announced today that Cisco, the worldwide leader in IT that transforms how people connect, communicate and collaborate, has been named “Gold Sponsor” of SYS-CON's 16th International Cloud Expo®, which will take place on June 9-11, 2015, at the Javits Center in New York City, NY. Cisco makes amazing things happen by connecting the unconnected. Cisco has shaped the future of the Internet by becoming the worldwide leader in transforming how people connect, communicate and collaborate. Cisco and our partners are building the platform for the Internet of Everything by connecting the...
15th Cloud Expo, which took place Nov. 4-6, 2014, at the Santa Clara Convention Center in Santa Clara, CA, expanded the conference content of @ThingsExpo, Big Data Expo, and DevOps Summit to include two developer events. IBM held a Bluemix Developer Playground on November 5 and ElasticBox held a Hackathon on November 6. Both events took place on the expo floor. The Bluemix Developer Playground, for developers of all levels, highlighted the ease of use of Bluemix, its services and functionality and provide short-term introductory projects that developers can complete between sessions.
Temasys has announced senior management additions to its team. Joining are David Holloway as Vice President of Commercial and Nadine Yap as Vice President of Product. Over the past 12 months Temasys has doubled in size as it adds new customers and expands the development of its Skylink platform. Skylink leads the charge to move WebRTC, traditionally seen as a desktop, browser based technology, to become a ubiquitous web communications technology on web and mobile, as well as Internet of Things compatible devices.
SYS-CON Events announced today that robomq.io will exhibit at SYS-CON's @ThingsExpo, which will take place on June 9-11, 2015, at the Javits Center in New York City, NY. robomq.io is an interoperable and composable platform that connects any device to any application. It helps systems integrators and the solution providers build new and innovative products and service for industries requiring monitoring or intelligence from devices and sensors.
The list of ‘new paradigm’ technologies that now surrounds us appears to be at an all time high. From cloud computing and Big Data analytics to Bring Your Own Device (BYOD) and the Internet of Things (IoT), today we have to deal with what the industry likes to call ‘paradigm shifts’ at every level of IT. This is disruption; of course, we understand that – change is almost always disruptive.
WebRTC is an up-and-coming standard that enables real-time voice and video to be directly embedded into browsers making the browser a primary user interface for communications and collaboration. WebRTC runs in a number of browsers today and is currently supported in over a billion installed browsers globally, across a range of platform OS and devices. Today, organizations that choose to deploy WebRTC applications and use a host machine that supports audio through USB or Bluetooth can use Plantronics products to connect and transit or receive the audio associated with the WebRTC session.
Docker is an excellent platform for organizations interested in running microservices. It offers portability and consistency between development and production environments, quick provisioning times, and a simple way to isolate services. In his session at DevOps Summit at 16th Cloud Expo, Shannon Williams, co-founder of Rancher Labs, will walk through these and other benefits of using Docker to run microservices, and provide an overview of RancherOS, a minimalist distribution of Linux designed expressly to run Docker. He will also discuss Rancher, an orchestration and service discovery platf...
Sonus Networks introduced the Sonus WebRTC Services Solution, a virtualized Web Real-Time Communications (WebRTC) offer, purpose-built for the Cloud. The WebRTC Services Solution provides signaling from WebRTC-to-WebRTC applications and interworking from WebRTC-to-Session Initiation Protocol (SIP), delivering advanced real-time communications capabilities on mobile applications and on websites, which are accessible via a browser.
SYS-CON Events announced today that Aria Systems, the leading innovator in recurring revenue, has been named “Bronze Sponsor” of SYS-CON's @ThingsExpo, which will take place on June 9–11, 2015, at the Javits Center in New York, NY. Proven by the world’s most demanding enterprises, including AAA NCNU, Constant Contact, Falck, Hootsuite, Pitney Bowes, Telekom Denmark, and VMware, Aria helps enterprises grow their recurring revenue businesses. With Aria’s end-to-end active monetization platform, global brands can get to market faster with a wider variety of products and services, while maximizin...
SYS-CON Media announced today that @WebRTCSummit Blog, the largest WebRTC resource in the world, has been launched. @WebRTCSummit Blog offers top articles, news stories, and blog posts from the world's well-known experts and guarantees better exposure for its authors than any other publication. @WebRTCSummit Blog can be bookmarked ▸ Here @WebRTCSummit conference site can be bookmarked ▸ Here
SYS-CON Events announced today that Alert Logic, the leading provider of Security-as-a-Service solutions for the cloud, has been named “Bronze Sponsor” of SYS-CON's 16th International Cloud Expo® and DevOps Summit 2015 New York, which will take place June 9-11, 2015, at the Javits Center in New York City, NY, and the 17th International Cloud Expo® and DevOps Summit 2015 Silicon Valley, which will take place November 3–5, 2015, at the Santa Clara Convention Center in Santa Clara, CA.
SYS-CON Events announced today that Vitria Technology, Inc. will exhibit at SYS-CON’s @ThingsExpo, which will take place on June 9-11, 2015, at the Javits Center in New York City, NY. Vitria will showcase the company’s new IoT Analytics Platform through live demonstrations at booth #330. Vitria’s IoT Analytics Platform, fully integrated and powered by an operational intelligence engine, enables customers to rapidly build and operationalize advanced analytics to deliver timely business outcomes for use cases across the industrial, enterprise, and consumer segments.
SYS-CON Events announced today that Solgenia will exhibit at SYS-CON's 16th International Cloud Expo®, which will take place on June 9-11, 2015, at the Javits Center in New York City, NY, and the 17th International Cloud Expo®, which will take place on November 3–5, 2015, at the Santa Clara Convention Center in Santa Clara, CA. Solgenia is the global market leader in Cloud Collaboration and Cloud Infrastructure software solutions. Designed to “Bridge the Gap” between Personal and Professional Social, Mobile and Cloud user experiences, our solutions help large and medium-sized organizations dr...
SYS-CON Events announced today that Liaison Technologies, a leading provider of data management and integration cloud services and solutions, has been named "Silver Sponsor" of SYS-CON's 16th International Cloud Expo®, which will take place on June 9-11, 2015, at the Javits Center in New York, NY. Liaison Technologies is a recognized market leader in providing cloud-enabled data integration and data management solutions to break down complex information barriers, enabling enterprises to make smarter decisions, faster.
Connected devices and the Internet of Things are getting significant momentum in 2014. In his session at Internet of @ThingsExpo, Jim Hunter, Chief Scientist & Technology Evangelist at Greenwave Systems, examined three key elements that together will drive mass adoption of the IoT before the end of 2015. The first element is the recent advent of robust open source protocols (like AllJoyn and WebRTC) that facilitate M2M communication. The second is broad availability of flexible, cost-effective storage designed to handle the massive surge in back-end data in a world where timely analytics is e...
SYS-CON Events announced today that Akana, formerly SOA Software, has been named “Bronze Sponsor” of SYS-CON's 16th International Cloud Expo® New York, which will take place June 9-11, 2015, at the Javits Center in New York City, NY. Akana’s comprehensive suite of API Management, API Security, Integrated SOA Governance, and Cloud Integration solutions helps businesses accelerate digital transformation by securely extending their reach across multiple channels – mobile, cloud and Internet of Things. Akana enables enterprises to share data as APIs, connect and integrate applications, drive part...