Skip to main content

         This documentation site is for previous versions. Visit our new documentation site for current releases.      

This content has been archived and is no longer being updated.

Links may not function; however, this content may be relevant to outdated versions of the product.

Big data functionality enhancements

Updated on September 10, 2021

Pega Platform™ features maintenance improvements to the Hadoop host configuration, HBase data set, and HDFS data set. System architects can more easily make the Apache Hadoop File System (HDFS) and HBase storages available to business scientists for exploratory analysis and predictive model building.

HBase data set

The HBase data set is designed to read and save data from an external Apache HBase storage. Enhancements to this data set allow you to:

  • Do the mapping between the HBase storage and the HBase data set without any connector reference.

  • Map a column family to a Page Group or a Page List.

  • Use a validation mechanism for the mapped property and the column family.

    An instance of the HBase data set rule

  • Use more complex property types in the HBase data set to support flexible data structure of the Apache HBase storage.

  • Use the Data Preview option to see data inside the Apache HBase storage.

    The Preview option for the HBase data set

HDFS data set

The HDFS data set is designed to read and save data from an external Apache Hadoop File System (HDFS) storage. Enhancements to this data set allow you to:

  • Use the HDFS data set to consume outputs of the Map-Reduce job.

  • Use the File system configuration option to look for files with a given pattern.

    An instance of the HDFS data set rule

Hadoop host configuration

Hadoop data instances allow you to define connection details for the Hadoop host, including connection details for datasets and connectors. Enhancements to the Hadoop record allow you to :

  • Do an optional NameNode host configuration for the HDFS connection on the Hadoop host configuration.

  • Do an optional Zookeeper host configuration for the HBase connection on the Hadoop host configuration.

    An instance of the Hadoop host with configured connection for HDFS and HBase

Have a question? Get answers now.

Visit the Support Center to ask questions, engage in discussions, share ideas, and help others.

Did you find this content helpful?

Want to help us improve this content?

We'd prefer it if you saw us at our best. is not optimized for Internet Explorer. For the optimal experience, please use:

Close Deprecation Notice
Contact us