You can access data on an Apache Hadoop cluster in order to read data from various data sources and move it between the cluster and the Pega 7 Platform. The Pega 7 Platform can read data from both the HBase database and the HDFS file system.
Big data components
The big data components on the Pega 7 Platform contain the following elements:
- Hadoop record – Defines the access methods and points to a Hadoop cluster. By defining access server hosts and ports, you can access various services on Hadoop.
- HDFS data set – Accesses an external Apache Hadoop Distributed File System (HDFS) for both read and write operations.
- HBase connector and HBase data set – Accesses the HBase server with both the connector architecture and data set framework.
HBase and HDFS data sets in data flows
In data flows, both HBase and HDFS data sets can be referenced in the Source and Destination shapes.
HBase data set in a data flow
If you define an HDFS or HBase data set as a source, it serves as the standard entry point of a data flow. A source defines data that you read from in the data flow.
Source shape properties
An HDFS or HBase data set that you define as a destination is the data point that you write to and is the standard output point of a data flow. Every data flow defines one or more destinations that output all results or results that are based on a specific condition.
Destination shape properties
After you select an HBase or HDFS data set for the source or destination, you can select an existing data set or create one. The autocomplete feature lists all the existing data sets in the current class instance. The Data set type property indicates the data set type.