Data Nodes (D-Nodes) are a repository for analytical data from a variety of sources that also provide the infrastructure for decisioning services. D-Nodes support Decision Data Store data sets that store data in JSON format. The data is distributed and replicated around the D-Node cluster and stored in the D-Node's file system. Although the data is located on different nodes, the data for a specific customer is stored on the same node to optimize the speed and performance of operations on the customer data. The Decision Data Store data set differs from Interaction History data set in that it is not a stream of data over time, only the latest snapshot.
A D-Node consists of a standard Pega® Platform installation that is augmented by a managed or an external Cassandra database process. D-Nodes rely on Pega Platform clustering implementation and can store high volumes of data at high speed. By creating a cluster of D-Nodes assigned to the applicable service, you can use Cassandra's attributes to protect data from failure, as well as provide horizontal scalability. D-Nodes can make specialized decisions in real time, so that you can take data from different sources and make it available for real-time and batch processing. The embedded nature of the persistence solution means that it is also optimized for batch processing.
- Decisioning services infrastructure
- D-Nodes operational guidelines
The D-Nodes infrastructure is an underlying element of the Pega Decision Management architecture. Pega Decision Management operates based on the allocation of Pega Platform nodes to different Decision Management services (Adaptive Decision Manager, Decision Data Store, Data Flow, and Visual Business Director). The Decision Data Store service is also the storage mechanism that supports the Adaptive Decision Manager and Visual Business Director functionality.
Decision Management services architecture
Planning for the D-Nodes that support the Decision Management services infrastructure requires that you estimate the initial data volume and the application workload. The requirements listed in this section are specific to D-Nodes. For more information about Pega Platform support, see the Platform Support Guide. Recommendations in the guide reflect recommendations from the DataStax documentation for Cassandra.
D-Nodes are designed to process high volumes of data so you should use multicore processors for production deployments, preferably 8-Core CPU processors. At the same time, D-Nodes are not memory-intensive applications and do not have any specific requirements beyond following Pega Platform memory recommendations. Per virtual or physical machine, plan for a minimum of 4 GB RAM. Because a D-Node’s primary function is to store data, the most important hardware component is a hard disk and the amount of allocated disk space. Two separate disks are recommended for each D-Node, one for the data and one to hold the commit log. Cassandra recommends SSD disks, but SATA disks with high write performance are also suitable. In terms of disk capacity, 1-3TB disks are optimal for each node in the cluster. Because data is replicated across different nodes and can be restored in case of hardware failure, RAID disks are generally not required. Use XFS or ext4 file systems in a production setup, because other file systems have maximum file size limits that do not allow for efficient use of disk capacity.
To use D-Nodes on Pega Platform, make sure that you meet the following prerequisites:
- The JVM version for Pega Platform installations running as D-Nodes must be 64-bit and at least Java 7.
- The initial Java heap size is 1024 MB and the maximum heap size is 2048 MB. The stated memory settings are a generic indication. For more information, see the DataStax documentation for Cassandra.
- JMX needs to be configured at the application server level so that you can use the Cassandra nodetool utility and Cassandra JMX endpoints.
- A minimum configuration applies to user resources. For more information about setting user resource limits, see the DataStax documentation for Cassandra.
- The requirement for Cassandra 2.1 applies when you use an external Cassandra cluster for the Decision Data Store service.
To achieve high availability and data consistency across nodes, a production cluster should have a minimum of three D-Nodes. Ideally, in a production cluster, the number of Pega Platform nodes should be the same as the number of D-Nodes. It is an operational requirement for D-Nodes to have a distinct IP address. Pega Platform supports running multiple D-Nodes on the same server, provided that each D-Node can acquire a different IP address, which is determined by looking at the clustering implementation (Hazelcast or Ignite). Each cluster member requires a different IP address, and JVMs cannot share the IP address.
By default, Cassandra is not active in a Pega Platform node that is not allocated to one of the Decision Management services. The TCP port through which data goes from and to the D-Node is the same IP address as the implementation selected to support the Pega 7 Platform cluster. The clustering implementation guarantees the node management operations (enabling, disabling, and running maintenance actions).
A minimum production configuration with managed Cassandra clusters where Cassandra is started as part of the Pega Platform node
An external Cassandra system decouples the Pega Platform node from the Cassandra storage and assumes the configuration of the connection to the external Cassandra cluster. D-Nodes require the following exposed ports:
|Storage port for internal Cassandra communication. It must be available on every D-Node.
|Default port for incoming stream communication. It must be available when you use stream
|CQL3 native transport port. It is required when the Decision Data Store service uses an external
Cassandra based on CQL3.
|Thrift RCP transport port. It is required when the Decision Data Store service uses an external
Cassandra based on Thrift.
|Communication port for the Visual Business Director service. The port number is determined by how
the VBD service is configured. By default, it is configured to require port 5751, but you can
change this setting. It can also dynamically look for the exposed port if you configure the
cluster port to auto-increment.
The Cassandra instance starts when you allocate the node to a service and on startup of a Pega Platform instance that was previously assigned to a service. The D-Node service listener is responsible for the bootstrap procedure. This listener is enabled by default, started at server startup, and can be monitored by using the System Management Application. It performs a series of operations to bootstrap the Cassandra engine:
- Reading the Cassandra configuration as defined in the file.
- Reading the Pega Platform configuration as defined in the file.
- Merging the Cassandra and Pega Platform configuration settings into a new file.
- Bootstrapping the Cassandra engine with the merged file.
Sizing a cluster
Pegasystems provides a sizing calculation tool in the form of an Excel spreadsheet that you can use to calculate the data size requirements. Use the following guidelines to estimate the number of required D-Nodes:
- Start with the recommended minimum number of D-Nodes (three).
- Compute the total required disk space and sum up the estimation for each Decision Data Store type data set.
Use the D-Node Sizing Model spreadsheet for this purpose. Contact Pegasystems Global Customer Support to obtain the D-Node Sizing Model spreadsheet.
- Dividing it by the number of available D-Nodes, make sure the total data size as calculated before remains within the 30% to 40% limit of available disk space for a single D-Node.
- If you use the cluster for simulations and data flow runs, add more nodes to the D-Node cluster to increase processing speed.
The following recommendations apply when working with D-Nodes:
- Use Gigabit Ethernet and network with a bandwidth of 1000 Mbit/s or greater.
- Each D-Node requires a dedicated IP address. If multiple D-Nodes run on the same server, each D-Node must have a different IP address.
- Synchronization is required between the clocks of Pega Platform nodes, servers hosting the application server, and servers hosting the database. For more information, see Clusters (multiple-node systems) — Concepts and terms.
Cassandra is largely self-managing, but you can perform several actions on the Services landing page to optimize the performance of D-Nodes. Some of these operations cannot be performed automatically. Although they do not affect service availability, they significantly slow down system performance by 20% to 30%. For this reason, Pegasystems recommends that you perform these operations in low-usage hours. For more information, see Managing decision data nodes.
Cassandra exposes management operations through Java Management Extensions (JMX). The D-Node makes these operations accessible to system administrators and external tools. You can use the Cassandra nodetool utility to manage and monitor a cluster, run commands to view detailed metrics for tables, servers, network/disk/CPU utilization, and read/write latency histograms. You can configure monitoring tools such as DataStax OPSCenter, Nagios, or Monit to monitor the services infrastructure. For more information about the nodetool utility, see the DataStax documentation for Cassandra.
If all nodes in the services infrastructure are always available, you do not need to run repair operations. However, the reality is that nodes go down for one reason or another (for example, maintenance, hardware failures, or network issues). The node repair operation makes data on a replica consistent with the data on other nodes. You can repair nodes assigned to the applicable part of the services infrastructure manually or by using the Cassandra nodetool utility. Follow these repair guidelines:
- Schedule repairs weekly
- Run a repair when a node was unavailable for more than one hour
- Run a repair in low-usage hours to avoid a decrease in performance
You recover a node when it becomes unavailable and cannot be started. If the data was previously owned by the failed node and is available on replica nodes, you drop the Cassandra commit log and data folders. Otherwise, you need to perform data recovery from backup. Although not mandatory, you can remove unused key ranges and run the cleanup operation on all nodes assigned to the Decision Management services.
To facilitate node repair and recovery, you can add Cassandra loggers. For more information, see D-Node failure due to an exception.
Cassandra performs data backup by taking a snapshot of all on-disk data files (the SSTable files) stored in the data directory. While the system is online, you can take a snapshot of all keyspaces, a single keyspace, or a single table. Use a parallel ssh tool (for example, pssh) to take a snapshot of the entire cluster and provide a consistent backup. Use the Cassandra nodetool utility to create and restore backups or schedule them to run automatically.
A single D-Node running out of disk space does not affect the service availability, but the performance degrades and eventually a failure can occur. To avoid this situation, the following recommendations apply:
- Make sure that all D-Nodes have the same disk capacity.
- Check disk usage with a server monitoring tool.
- Allocate more disk space when the server monitoring tool alerts that 70% to 80% of disk space has been used.
Disk space can be extended by adding more nodes to the services or by adding more disk space. To add more disk space, shut down the node, replace or expand the existing disk, and start the node. These operations do not result in service outage as long as they are performed in time to avoid failures.