Skip to main content

         This documentation site is for previous versions. Visit our new documentation site for current releases.      

This content has been archived and is no longer being updated.

Links may not function; however, this content may be relevant to outdated versions of the product.

Introduction to big data capabilities on the Pega 7 Platform

Updated on September 10, 2021

You can access data on an Apache Hadoop cluster in order to read data from various data sources and move it between the cluster and the Pega 7 Platform. The Pega 7 Platform can read data from both the HBase database and the HDFS file system.

Big data components

The big data components on the Pega 7 Platform contain the following elements:

  • Hadoop record – Defines the access methods and points to a Hadoop cluster. By defining access server hosts and ports, you can access various services on Hadoop.
  • HDFS data set – Accesses an external Apache Hadoop Distributed File System (HDFS) for both read and write operations.
  • HBase connector and HBase data set – Accesses the HBase server with both the connector architecture and data set framework.

HBase and HDFS data sets in data flows

In data flows, both HBase and HDFS data sets can be referenced in the Source and Destination shapes.

Data flow dialog box

HBase data set in a data flow

If you define an HDFS or HBase data set as a source, it serves as the standard entry point of a data flow. A source defines data that you read from in the data flow.

Hadoop source properties dialog box

Source shape properties

An HDFS or HBase data set that you define as a destination is the data point that you write to and is the standard output point of a data flow. Every data flow defines one or more destinations that output all results or results that are based on a specific condition.

Hadoop destination properties dialog box

Destination shape properties

After you select an HBase or HDFS data set for the source or destination, you can select an existing data set or create one. The autocomplete feature lists all the existing data sets in the current class instance. The Data set type property indicates the data set type.

  • Previous topic Managing big data to make informed business decisions
  • Next topic Big data enhancements in decision management

Have a question? Get answers now.

Visit the Support Center to ask questions, engage in discussions, share ideas, and help others.

Did you find this content helpful?

Want to help us improve this content?

We'd prefer it if you saw us at our best. is not optimized for Internet Explorer. For the optimal experience, please use:

Close Deprecation Notice
Contact us