Welcome!

Apache Authors: Carmen Gonzalez, Liz McMillan, Elizabeth White, Pat Romanski, Christopher Harrold

Related Topics: @DXWorldExpo, Java IoT, Microservices Expo, Recurring Revenue, @CloudExpo, SDN Journal

@DXWorldExpo: Article

Five Big Data Features in Oracle

Traditional RDBMS and new data processing

Over the past two decades relational databases have been most successful in serving large scale OLTP and OLAP applications across enterprises. However, in the past couple of years with the advent of Big Data, processing especially processing unstructured data coupled with the need for processing massive quantities of data, made the industry to look into non RDBMS solutions. This has led to the popularity of NOSQL databases as well as massively parallel processing frameworks.

However, the traditional RDBMS have been quick to react and added several Big Data features as part of their offering so that enterprises with a heavy investment in traditional RDBMS can have the best of both worlds by properly leveraging these new features.

The following sections provide an idea about Big Data features in the popular Oracle databases. Please refer to my earlier articles on Five Big Data Features in SQL Server, Five Big Data Features in DB2.

1. External Tables: As the name suggests, an external table accesses data in external sources as if this data were in a table in the database. In the earlier releases external tables were mainly used to access CSV files and Oracle Loader files. However, to support Big Data, Oracle has released a direct connector to the Hadoop HDFS file system on which an external table can be built. With a SQL-like CREATE TABLE syntax an external table feature allows easy access to the HDFS file system. Oracle SQL Connector for HDFS creates the external table definition from a Hive table by contacting the Hive meta store client to retrieve information about the table columns and the location of the table data. In addition, the Hive table data paths are published to the location files of the Oracle external table.

Considering the distinct advantages of the Columnar databases for certain types of workloads, external tables also support Columnar storage. With Hybrid Columnar Compression, the database stores the same column for a group of rows together. The data block does not store data in row-major format, but uses a combination of both row and columnar methods.

Storing column data together, with the same data type and similar characteristics, dramatically increases the storage savings achieved from compression. The database compresses data manipulated by any SQL operation, although compression levels are higher for direct path loads. Database operations work transparently against compressed objects, so no application changes are required.

As a complimentary option, Oracle also provides a Loader for Hadoop. Oracle Loader for Hadoop is a MapReduce application that is invoked as a command-line utility. It provides an efficient and high-performance loader for fast movement of data from a Hadoop cluster into a table in an Oracle database.

2. Oracle Text: Oracle Text enables you to build text query applications and document classification applications. Oracle Text provides indexing, word and theme searching, and viewing capabilities for text. Oracle Text indexes text by converting all words into tokens. The general structure of an Oracle Text CONTEXT index is an inverted index where each token contains the list of documents (rows) that contain that token. The lexer breaks the text into tokens according to your language. These tokens are usually words. Oracle Text can index most document formats including HTML, PDF, Microsoft Word, and plain text, you can load any supported type into the text column. Oracle Text can index most languages. BASIC_LEXER preference type to index whitespace-delimited languages such as English, French, German, and Spanish. MULTI_LEXER preference type for indexing tables containing documents of different languages such as English, German, and Japanese.

The basic Oracle Text query takes a query expression, usually a word with or without operators, as input. Oracle Text returns all documents (previously indexed) that satisfy the expression along with a relevance score for each document. Defining a custom thesaurus enables you to process queries more intelligently. Because users of your application might not know which words represent a topic, you can define synonyms or narrower terms for likely query terms. You can use the thesaurus operators to expand your query into your thesaurus terms.

With extensive support for processing unstructured text document, Oracle Text can play a major role in Big Data processing.

3. VLDB Partitioning: If one of the appealing features of Big Data frameworks is the ability to split large quantities of data across multiple nodes, then Oracle's partitioning features performs the similar functionality and it exists for a while. Partitioning addresses key issues in supporting very large tables and indexes by decomposing them into smaller and more manageable pieces called partitions, which are entirely transparent to an application. SQL queries and Data Manipulation Language (DML) statements do not need to be modified to access partitioned tables.

Partitioning is a critical feature for managing very large databases. Growth is the basic challenge that partitioning addresses for very large databases, and partitioning enables a divide and conquer technique for managing the tables and indexes in the database, especially as those tables and indexes grow.

Oracle also supports many different types of partitioning types depending on the nature of the applications.

  • Range Partitioning: Range partitioning maps data to partitions based on ranges of values of the partitioning key that you establish for each partition.
  • Hash Partitioning: Hash partitioning maps data to partitions based on a hashing algorithm that Oracle applies to the partitioning key that you identify.
  • List Partitioning: List partitioning enables you to explicitly control how rows map to partitions by specifying a list of discrete values for the partitioning key in the description for each partition.
  • Composite Partitioning: Composite partitioning is a combination of the basic data distribution methods; a table is partitioned by one data distribution method and then each partition is further subdivided into subpartitions using a second data distribution method.

4. Native Parallelism & Grid Computing: While the MPP (Massively Parallel Processing) computing forms the basis of Big Data Processing that does not mean the complimentary computing of SMP (symmetric processing) cannot be utilized for processing large quantities of data and many of the Big Data features in Oracle like VLDB partitioning fully utilize the SMP power of servers.

Parallel execution enables the application of multiple CPU and I/O resources to the execution of a single database operation. It dramatically reduces response time for data-intensive operations on large databases. You can use parallel queries and parallel subqueries in SELECT statements and execute in parallel the query portions of DDL statements and DML statements (INSERT, UPDATE, and DELETE). You can also query external tables in parallel.

Oracle also supports MPP kind of parallelism using Oracle Real Application Clusters (Oracle RAC). Oracle RAC enables you to cluster an Oracle database. Oracle RAC uses Oracle Clusterware for the infrastructure to bind multiple servers so they operate as a single system. While RAC may not be suitable for analytical workloads but in conjunction with other features it may help real time analytics.

5. XML DB: Oracle XML DB is a set of Oracle Database technologies related to high-performance handling of XML data: storing, generating, accessing, searching, validating, transforming, evolving, and indexing. It provides native XML support by encompassing both the SQL and XML data models in an interoperable way. Oracle XML DB is included as part of Oracle Database .

XMLType is an abstract data type for native handling of XML data in the database. This data type is integrated with the regular RDBMS tables so that this can be just another column in a table. The table with XMLType can be partitioned using the above mentioned VLDB partitioning techniques making it a good candidate for Big Data processing. There is another component of XMLDB namely Oracle XML DB Repository. Using XML DB Repository we can store any kind of documents in the repository, including XML documents that are associated with an XML schema.

Summary
Traditional high performance RDBMS like Oracle have their strengths. They are very strong in maintaining the data integrity and quality in the form of constraints, foreign keys and other validation mechanisms. They are also strong in transactional integrity by providing a superior locking model, automatic dead lock resolution, etc. Howeve, initially they are not perceived to adjust to Big Data processing needs of enterprises.

With the enhancements in the products made by respective vendors, now databases like Oracle have been enhanced with Big Data processing features that makes them the best candidate for enterprises looking for best-of- breed features between traditional RDBMS and Big Data processing systems, and to leverage the best of existing investments.

More Stories By Srinivasan Sundara Rajan

Highly passionate about utilizing Digital Technologies to enable next generation enterprise. Believes in enterprise transformation through the Natives (Cloud Native & Mobile Native).

IoT & Smart Cities Stories
The platform combines the strengths of Singtel's extensive, intelligent network capabilities with Microsoft's cloud expertise to create a unique solution that sets new standards for IoT applications," said Mr Diomedes Kastanis, Head of IoT at Singtel. "Our solution provides speed, transparency and flexibility, paving the way for a more pervasive use of IoT to accelerate enterprises' digitalisation efforts. AI-powered intelligent connectivity over Microsoft Azure will be the fastest connected pat...
There are many examples of disruption in consumer space – Uber disrupting the cab industry, Airbnb disrupting the hospitality industry and so on; but have you wondered who is disrupting support and operations? AISERA helps make businesses and customers successful by offering consumer-like user experience for support and operations. We have built the world’s first AI-driven IT / HR / Cloud / Customer Support and Operations solution.
Codete accelerates their clients growth through technological expertise and experience. Codite team works with organizations to meet the challenges that digitalization presents. Their clients include digital start-ups as well as established enterprises in the IT industry. To stay competitive in a highly innovative IT industry, strong R&D departments and bold spin-off initiatives is a must. Codete Data Science and Software Architects teams help corporate clients to stay up to date with the mod...
At CloudEXPO Silicon Valley, June 24-26, 2019, Digital Transformation (DX) is a major focus with expanded DevOpsSUMMIT and FinTechEXPO programs within the DXWorldEXPO agenda. Successful transformation requires a laser focus on being data-driven and on using all the tools available that enable transformation if they plan to survive over the long term. A total of 88% of Fortune 500 companies from a generation ago are now out of business. Only 12% still survive. Similar percentages are found throug...
Druva is the global leader in Cloud Data Protection and Management, delivering the industry's first data management-as-a-service solution that aggregates data from endpoints, servers and cloud applications and leverages the public cloud to offer a single pane of glass to enable data protection, governance and intelligence-dramatically increasing the availability and visibility of business critical information, while reducing the risk, cost and complexity of managing and protecting it. Druva's...
BMC has unmatched experience in IT management, supporting 92 of the Forbes Global 100, and earning recognition as an ITSM Gartner Magic Quadrant Leader for five years running. Our solutions offer speed, agility, and efficiency to tackle business challenges in the areas of service management, automation, operations, and the mainframe.
The Jevons Paradox suggests that when technological advances increase efficiency of a resource, it results in an overall increase in consumption. Writing on the increased use of coal as a result of technological improvements, 19th-century economist William Stanley Jevons found that these improvements led to the development of new ways to utilize coal. In his session at 19th Cloud Expo, Mark Thiele, Chief Strategy Officer for Apcera, compared the Jevons Paradox to modern-day enterprise IT, examin...
With 10 simultaneous tracks, keynotes, general sessions and targeted breakout classes, @CloudEXPO and DXWorldEXPO are two of the most important technology events of the year. Since its launch over eight years ago, @CloudEXPO and DXWorldEXPO have presented a rock star faculty as well as showcased hundreds of sponsors and exhibitors! In this blog post, we provide 7 tips on how, as part of our world-class faculty, you can deliver one of the most popular sessions at our events. But before reading...
DSR is a supplier of project management, consultancy services and IT solutions that increase effectiveness of a company's operations in the production sector. The company combines in-depth knowledge of international companies with expert knowledge utilising IT tools that support manufacturing and distribution processes. DSR ensures optimization and integration of internal processes which is necessary for companies to grow rapidly. The rapid growth is possible thanks, to specialized services an...
At CloudEXPO Silicon Valley, June 24-26, 2019, Digital Transformation (DX) is a major focus with expanded DevOpsSUMMIT and FinTechEXPO programs within the DXWorldEXPO agenda. Successful transformation requires a laser focus on being data-driven and on using all the tools available that enable transformation if they plan to survive over the long term. A total of 88% of Fortune 500 companies from a generation ago are now out of business. Only 12% still survive. Similar percentages are found throug...