How to estimate disk space requirements for a PegaRULES database (Oracle)
You can estimate the disk usage for a specific deployment using the approach outlined below.
The estimation process involves first collecting average sizes for the exposed columns and the Blob column, for each table in a test system. These are then multiplied by the expected row counts for in the production system.
Suggested approach
The size of work objects depend on the specifics of the your application and whether compression is enabled.
To estimate the total required disk usage, gather two totals for the PegaRULES database:
Storage allocated by the BLOB columns.
Storage allocated by the exposed columns.
BLOB columns
Most tables include a column named pzPVStream — a BLOB column that is not reflected in the exposed columns sizing number (calculated in the next section). To get the associated size of this column, use the following SQL command:
SELECT AVG(dbms_lob.getlength(pzpvstream)) FROM schemaname_tablename.pc_assign_workbasket;
Example:
SELECT AVG(dbms_lob.getlength(pzpvstream)) FROM rulepro_logon.pc_assign_workbasket;
The output from this statement is a single number, which is the average length of the BLOB column in the table pc_assign_workbasket
.
This command determines the average size of the BLOB columns for one table (in the above example, pc_assign_workbasket). Issue the command for each table which has pzPVStream column..
To see the individual column sizes, remove the AVG function from the command:
SELECT dbms_lob.getlength(pzpvstream) FROM schemaname_tablename.pc_assign_workbasket;
Example:
SELECT dbms_lob.getlength(pzpvstream) FROM rulepro_logon.pc_assign_workbasket;
The output from this statement is a list of numbers, which is the length of each row for the BLOB column in the table pc_assign_workbasket
.
Determining Optimal Block Size
Important: Each database block can hold only one BLOB. This means that the average size of the LOB (from above formulae) is very important in determining the optimal block size. If (for example) the average size of the LOB is 4K, and the size of the database blocks is 8K, that means that on average, there is 4K of wasted space in each block that holds a LOB.
On the other hand, making the blocks too small has another consequence. If (for example) the average size of the LOB is 8K, and the block size is 4K, then for each LOB, the system must set up a chained row. If the LOB does not fit into one block, it is be broken and stored in as many blocks as are required; each block also contains a pointer to the previous block containing the LOB (“chained”), so that when the data in the LOB is requested, it may all be returned correctly. This is a less efficient way to save and retrieve information, as the system needs to return two or more blocks for every LOB requested, and significantly slows performance if every LOB is “chained.”
Therefore, the optimal size of the database block should be large enough to hold an average LOB. Some LOBs should chain, and there is some wasted space for some of the smaller LOB rows, but overall, a balance should be struck between these two extremes.
Important: Assume that the entries grow. Do not create a block where the average LOB takes up 100% of the space, since chaining is necessary almost immediately.
There are several options to maximize efficiency with the exposed columns and the LOB space:
- Some tables may require more LOBspace than others; make the tablespace different if the LOBspace varies radically from table to table.
LOBspace may be different than the exposed column tablespace. It may be more efficient to separate them and size them differently.
Important: The Block size is defined by the DBA. When working with the exposed column data (below), the system fills the database block with as much row data as possible (from multiple rows), before going on to the next block – there is no wasted space, as there could be with LOB data. Therefore, the DBA should base the block size on the LOB size, not the exposed column data. Once the optimal DB block size has been calculated based on the LOBs, the below calculations may be done on the Exposed Columns to determine the grand total disk space required.
Exposed columns
To obtain information on exposed columns, analyze the database objects within the schema of the PegaRULES owner. This provides the number of rows, along with the average row size for all tables.
The tables most likely to grow are:
pc_history_work
pc_assign_workbasket
pc_assign_worklist
pc_work
Depending on flows and the application, other tables (such as pc_other
) may grow as well. Review tables regularly to identify other growing tables.
After you have analyzed all the tables, the number of rows in each and the average row length of the exposed columns is known. (Exposed columns are the columns against which it is possible to run queries.) Using theses results, you can then calculate the size required for the exposed columns, by:
Calculating the number of rows per Block.
Calculating the number of Blocks required, by dividing the number of rows per Block into the number of rows in the table.
Multiplying the Blocks required by the size of the block to obtain the space required for the exposed columns.
For example:
- Assume that the database was created with an 8K Block Size.
- Analysis of the
pc_work
table shows that it contains 2,000 rows - If the average row length is 1.5K bytes , then approximately 5 rows can be placed in each block. (Note that this is an estimate, due to the percent freespace specified when the table was allocated, and the 'average' row length: 8 / 1.5 = 5.33333, or 5 full rows.)
- Therefore, the
pc_work
table requires approximately 400 blocks for storage (2000/5 = 400). This then results in 400 8k pages (one page per block), or 3.2 MB of space required.
Thus, the formula in text for estimating the number of blocks needed is:
(Number of rows / (DB Block size/average row size))
Compute the average row size by analyzing the table, using the command:
Analyze Table tablename estimate statistics
This analyzes the table and populates the value of average row length. Display the resulting average row length is displayed by using the command:
Select avg_row_len from user_tables where table_name = ‘tablename’
The formula in text for estimating the space required is:
Number of DB Blocks needed for the table * the Block size
Important: After finding the size of the BLOB for each table, and the size of the exposed columns of each table, add these numbers together to find the total estimated size requirement for the Oracle database.
Oracle 9i
Beginning in Oracle 9i, Oracle supports tablespaces of differing block size. Depending upon the needs of the system, you can set up tablespaces with different block sizes for:
the LOB data
index data
regular tables (exposed columns)
When retrieving data, the operating system passes through a certain block size (for example, Windows passes 64K at one time). It is important to know what amount of data your operating system passes, and then set the block sizes to most efficiently provide that amount of data. For example, indexes should be set up with a large block size. Since indexes generally have small entries, this enables more entries to be read into memory with each pass.
You can set the Oracle parameter db_file_multiblock_read_count
to set up data retrieval most efficiently. As stated above, the operating system returns a certain amount of data. If the db_file_multiblock_read_count
parameter is not set, then only one block is returned per system read, no matter how much data the system can return.
So for example, if an operation makes a request for 256K bytes through a Windows system, the system could return this data in 64K increments. If the database block size is 8K, and db_file_multiblock_read_count
is not set, then each read only returns 8K, and the full amount of data requires 32 reads. However, if db_file_multiblock_read_count
is set to “8” (to read 8 blocks of 8K each, taking advantage of the 64K limit), then only 4 reads are necessary, greatly increasing the efficiency of the database access.
Notes
- Compression provides a significant reduction in disk storage required for Blobs, but may increase server CPU processing . Enable compression with a setting in the
pegarules.xml
file. See How to compress the BLOB values in the PegaRULES database - Memory caches are associated with tablespaces. A DBA may set the memory cache size. A discussion of memory caches is outside the scope of this document. For more information on memory caching in Oracle, refer to the Oracle Performance Tuning guide, available at http://otn.oracle.com (Registration required.)
Previous topic Euro symbol support in Oracle Next topic How to set up a JDBC connection pool for the PegaRULES database on Oracle (Tomcat)