Configuring a full extract
With this approach, you create a BIX extract directly on the Policy Profile class, and the extract is configured to include all the properties available on that class in incremental extracts. Depending on your use of the KYC engine, the size of the generated files can be significantly large, but it will contain all the information you need for a deep analysis of the data.
However, it is important to note that the exported data will also include associated metadata such as visibility conditions or on-change actions that are used for the display and processing of the KYC type and items. This metadata contains the names of decision rules and properties that, although necessary at runtime, may not be of any value in your data mining systems. This metadata can amount to up to 70 % of the size of an average Policy Profile. Therefore, we recommend that you only follow this approach if some of that metadata is required for your reporting (for example, if you need to analyze the literal wording of the questions in the KYC questionnaire), or if you want to export KYC data quickly and where the size of the extract is not a matter of concern. If the size of the extract is a concern, see Configuring a selective data extract .
To configure a full extract, the first step is to include the BIX ruleset in your application stack.For details about how to carry out this step, see Enabling the BIX ruleset .
After the ruleset has been made available, you must create the assets that will be used for generating the BIX extracts for your due diligence data. These assets include: the extract rule, and a job scheduler to periodically run the extract rule in order to generate the extracts.
Creating an extract rule
Extract rules are similar to any other rules available in Pega, and be can created by navigating to:
To extract all the data available in the Policy Profile, you must create the extract rule in the PegaKYC -Data -PolicyProfile class, which is where all the KYC data is stored. For a partial extraction of data, see Configuring a selective data extract. The extract rule has three important configuration elements that must be carefully coordinated:
- Target data
- This configuration is available in the Definition tab of the extract rule, and allows you to select the data to be extracted for each object in the extract. BIX gives you the possibility to select an output format (CSV, Database or XML ), and the individual properties to be extracted. However, given the very large number of properties that the KYC Types hold and the nature of the data structure that support it, the only output format that can be used in this specific scenario is XML (and with all the current properties). For this purpose, select XML as Output format, and check the Get all properties check box.
- Filter Criteria
- When the Extract rules are executed, they generate by default an output file with all the objects of the class that they belong to. If used with this default configuration, the size of the output file could exponentially increase over time and become difficult to handle. Therefore, it is important to carefully pick your filtering criteria from the Filter criteria tab on the Extract rule. We recommend that you extract KYC data by generating incremental extracts, where each extract holds only those objects that were updated since the last extract. This can be easily configured by checking the Use last updated time as start check box.
- File name
- This configuration is available as the File Specification tab on the extract rule. It allows you to specify where the output file will be placed, and what the name of the file will be. If you choose to generate automated extracts on a periodic basis, it is important that each output file has a unique name. Otherwise, each new output file will override the previous one, which will result in losing data. You can ensure that each output file has a unique name by including the %t wild card, which appends the extraction time stamp to the file name.
Creating a job scheduler
To avoid the need for manual intervention when generating the extract rules, you must create a job scheduler that can execute the extract rules on a periodic basis and generate the due diligence extracts. You can do this by configuring the job scheduler to invoke the pxExtractDataWithArgs extraction activity, and passing the name and class of your extract rule.
For more details, see the Pega Community article Using Job Scheduler rule to Extract Data with BIX .
As part of the configuration of the job scheduler, you must determine the optimal frequency of generating extracts that best suits your business needs and technical infrastructure. The frequency of extracts must be dictated by the size of the data that you wish to extract, and by the time that it takes to extract it. We recommend that you schedule frequent runs with smaller data extracts that last for a number of minutes, rather than a single run that can last for a matter of hours.
For example, a financial institution that onboards fifty thousand customers every day, and has a job scheduler that executes daily, may need around four hours to generate a single extract that contains the fifty thousand Policy Profiles. If there is any error, or if there is a system outage, during the execution of the extraction job, the job will be aborted and the output file may not be saved, or may be corrupted. After that, the job scheduler will start from the beginning, which wastes valuable time. This situation can be avoided by:
- either reducing the size of the extracts by minimizing the data that is extracted (by deleting any data which is not required from the Policy Profiles, see Configuring a selective data extract)
- or increasing the frequency of executions to, for example, every one hour or every two hours, which leads to more lightweight extractions that are easier to maintain.
Previous topic Exporting due diligence data Next topic Configuring a selective data extract