Skip to main content

         This documentation site is for previous versions. Visit our new documentation site for current releases.      

The case archiving process

Updated on January 19, 2023

Understand the case archiving and data expunging processes so you can manage your historical case data and thereby optimise your application database performance.

Case archiving process

The case archiving process in Pega Platform consists of a set of jobs that move case data to an external repository as defined in your case archiving policy. After the system completes the case archiving process, you can still search and review cases in the external repository as illustrated in the following diagram.

Case archiving process outline
The case archiving process outlines steps for setting up jobs to archive case data and after it is archived, reviewing the case data using Elasticsearch indexing.

Case archiving process details

StepTask or actionMore information
Configure the case archiving process

Before you can begin archiving case data, plan and set up your case archiving process.

In Pega Cloud deployments, Pega Platform automaticaly uses Pega Cloud File storage (PCFS) to store archived data; in virtual-machine-based deployments, you must configure the secondary storage repository before you can archive data. For details, see Secondary storage repository for archived data.

Planning your case archiving process

Configure the case archiving process

Define the case archiving policy for each case type.

Your case archiving policy configures the number of days that cases must be resolved after which they are eligible for both archiving and deletion.

Defining the archiving policy for case types.
Define dynamic system settings to configure:
  • Case archiving settings that improve performance and control batch size.
  • Optional settings for recording insert values, running jobs in parallel, or updating cases during a query.
  • An optional post-archive purge cycle to improve performance during purge cycles and to control the intervals at which purge cycles occur.
Settings that control case archiving processes
Schedule case archiving jobs.Configure the pyPegaArchiverpyPegaPurger and pyPegaIndexer job schedulers for case archiving as needed for your business purposes.

Schedule case archiving jobs.

Run case archiving jobs Monitor process.Archival and expunge job statistics
Review log files.Archival jobs log file entries
Complete case archivingThe system copies eligible cases to the secondary storage repository.Schedule case archiving jobs
Elasticsearch indexes the copied cases.
The system removes the copied cases removed from the Pega Platform database.
Access archived case data

Search, review, recover, or clone the case data whenever you need.

Retrieving archived case data

Archived data

The archival process archives certain artifacts within a case, such as work history and attachments.

The following table shows the artifacts that Pega Platform archives.

Archived case artifacts

Archived artifactsNon-archived artifacts
  • Child cases
  • Declarative indexes
  • Work history
  • Pulse replies, including link attachments
  • Attachments
  • Link association document (Link-Association-Document)
  • Social reference documents (PegaSocial-Document)

    Note: If an attachment is shared between multiple cases, the shared attachment will be copied to the archive, but not purged from Pega Platform database.
  • Attached work (WorkAttach)
  • Ad hoc subcases
  • Bookmarked messages
  • Custom associations
  • Documents
  • Followed users
  • Liked messages
  • Links to folders
  • Links to top cases
  • Tags
  • Workbaskets
  • Worklists

Case archival jobs

Case archiving pipeline
The archival pipelines for different aspects of case archiving.

Case archival pipeline explained

Job or activityEvents

The pyPegaArchiver Job Scheduler (default short description: Archival_Copier) copies files to secondary storage by using the following steps:

  1. The job uses a crawler to identify, in bulk, cases that are eligible for archiving.
  2. The crawler adds the cases to the archiving pipeline.
  3. The crawler validates the resolution of all subcases.
  4. The job copies the cases and their artifacts to the secondary storage repository.
pyPegaIndexerThe pyPegaIndexer Job Scheduler (default short description: Archival_Indexer) indexes the copied files into Elasticsearch. The index keeps the association between an archived case and its archived file in the secondary storage.
pyPegaPurgerThe pyPegaPurger Job Scheduler (default short description: Archival_Purger) deletes cases and their associated data from the primary database. The job also integrates a SQL VACUUM command to process deleted space and reclaim the irrelevant empty database tables.
Optional: pyArchival_ReIndexerThe pyArchival_ReIndexer (default short description: Archival_ReIndexer) Job Scheduler fixes corrupted Elasticsearch indexes. This job follows a case archival and purge job when trying to fix case archives.

Case exclusions

Exclude cases from the archival process to prevent them from being moved to the secondary storage repository and subsequently deleted from the secondary repository. For example, exclude a case that should not be archived due to a legal hold.

For more information about excluding cases from the archive and purge process, see Excluding cases from archival and expunge.

Have a question? Get answers now.

Visit the Support Center to ask questions, engage in discussions, share ideas, and help others.

Did you find this content helpful?

100% found this content useful

Want to help us improve this content?

We'd prefer it if you saw us at our best. is not optimized for Internet Explorer. For the optimal experience, please use:

Close Deprecation Notice
Contact us