Skip to main content


         This documentation site is for previous versions. Visit our new documentation site for current releases.      
 

Configuring Hadoop settings for an HDFS connection

Updated on May 17, 2024

Use the HDFS settings in the Hadoop data instance to configure connection details for the HDFS data sets.

By using the Hadoop infrastructure, you can process large amounts of data directly on the Hadoop cluster and reduce the data transfer between the Hadoop cluster and the Pega Platform. Hadoop configuration instances are records in the SysAdmin category and belong to the Data-Admin-Hadoop class.

Before you begin: Before you can connect to an Apache HBase or HDFS data store, upload the relevant client JAR files into the application container with Pega Platform. For more information, see HDFS and HBase client and server versions supported by Pega Platform.
  1. In the header of Dev Studio, click CreateSysAdminHadoop.
  2. On the Create Hadoop form, enter a description and a name for the Hadoop data instance.
  3. Click Create and open.
  4. In the Connection section, specify a master Hadoop host.
    This host must contain HDFS NameNode and HBase master node.
  5. In the HDFS section, select the Use HDFS configuration check box.
  6. In the User name field, enter the user name to authenticate in HDFS.
  7. In the Port field, enter the port of the HDFS NameNode.
    The default port is 8020.
  8. Optional: To specify a custom HFDS NameNode, select the Advanced configuration check box.
    • In the Namenode field, specify a custom HDFS NameNode that is different from the one defined in the common configuration.
    • In the Response timeout field, enter the number of milliseconds to wait for the server response. Enter zero or leave it empty to wait indefinitely.

      The default timeout is 3000.

    • In the KMS URI field, specify an instance of Hadoop Key Management Server to access encrypted files from the Hadoop server.

      For example, for a KMS server running on http://localhost:16000/kms, the KMS URI is kms://http@localhost:16000/kms.

  9. Optional: To enable secure connections, select the Use authentication check box.
    Note: To authenticate with Kerberos, you must configure your environment. For more details, see the Kerberos Network Authentication Protocol documentation.
    • In the Master kerberos principal field, enter the Kerberos principal name of the HDFS NameNode as defined and authenticated in the Kerberos Key Distribution Center, typically following the nn/<hostname>@<REALM> pattern.
    • In the Client kerberos principal field, enter the Kerberos principal name of a user as defined in Kerberos, typically in the following format: <username>/<hostname>@<REALM>.
    • In the Keystore field, enter the name of a keystore that contains a keytab file with the keys for the user who is defined in the Client Kerberos principal setting.
      Note: The keytab file is in a readable location on the Pega Platform server, for example: /etc/hdfs/conf/thisUser.keytab or c:\authentication\hdfs\conf\thisUser.keytab.
  10. Test the connection to the HDFS NameNode by clicking Test connectivity.
  11. Click Save.

Have a question? Get answers now.

Visit the Support Center to ask questions, engage in discussions, share ideas, and help others.

Did you find this content helpful?

Want to help us improve this content?

We'd prefer it if you saw us at our best.

Pega.com is not optimized for Internet Explorer. For the optimal experience, please use:

Close Deprecation Notice
Contact us