We spend most of the time in tuning the business application, but often ignore the system application which is integrated with DB2 such as MQ-Replication. Poor performance of Q-replication does NOT only affect its own performance but also affect badly on the whole physical database server. I highlight about the basic tuning parameters operated at capture/apply program and some maintenance on Q-rep control tables.
Tune the parameters for Capture program
1. High latency costs on CPU of the database server, if high latency is not your concern adjust some of the configurations parameters :
=> sleep_interval : This sets the frequency for reading a DB2 log changes by Q-capture program. Its default value is 0.5 second (500 milliseconds) which is usually very high and it can be lowered down a little bit and this impacts on CPU very much. I would say reasonable value could be 1 or 2 seconds (1000 or 2000 milliseconds) depending on your environment.
=> commit_interval : Adjust it to change the frequency of committing the captured transaction to Q-Manager. Its default value is also a 0.5 second. Good starting point is to match the value with sleep_interval unless you have optimum memory_limit configured.
=> memory_limit : This memory is used for Q-capture program to build transactions in memory. If the transactions can't fit into this memory, it will be spilled over to disk and overall slows down the replication. Usually it happens if there is a big transaction needs to be captured , it could also happen due to either delayed value configured for sleep_interval or commit_interval. The default value is 500 MB on LUW. Before adjusting this value you can always check whether you spilled the transaction over to disk. Use the below query to find out :
select max(max_trans_size) from qasn.ibmqrep_capmon where trans_spilled > 0
2. Other parameters can also be considered for tuning but they are not as important as above :
=> max_message_size : This parameter determines the size of each message which is sent through SendQ. Having its size small, the messages are sent in chunks which impacts on capture performance, and also in apply as the changes are applied in chunks too. Following guidelines can be followed for setting this parameter :
- One typical large transaction should fit into one message, so setting the parameter max_message_size to be slightly higher than the maximum size of a typical transaction should be sufficient
- For very large transactions that exceed the value of max_message_size that you derived above, ensure that you set max_message_size such that at least one row of the transaction fits into one message.
- The value of max_message_size must be less than or equal to the WebSphere® MQ parameter MAXMSGL, which sets the maximum message size for a queue.
=> monitor_interval : This parameter sets a frequency of a Q-capture inserting a row into ibmqrep_capmon & ibmqrep_capqmon tables which is used for monitoring the success & failure of Q-capture program. Its default value is 30 seconds, which is usually too high and you can have some delayed frequency, 2 to 5 minutes depending on your environment, of inserting monitoring data into these tables.
=> prune_interval : It sets a frequency for a Q-capture program could prunes the data from tables IBMQREP_CAPMON, IBMQREP_CAPQMON, IBMQREP_SIGNAL and IBMQREP_CAPTRACE. The default value is 5 minutes, if its reduced further that means pruning queries will be executed more frequently and will utilize the system resources more frequently.
=> trans_batch_sz : It specify that Q-capture program groups transactions into batches before they are replicated. Grouping small transactions into single MQ message can reduce CPU consumption on the server. The default value is 1 which means no grouping , setting some higher value which will yield in some grouping and it could reduce the CPU overhead that is incurred each time Q-Capture puts a message on the sendQ.
Tune the parameters for Apply program
=> num_apply_agents : It determines as how many agent threads will be used by Q-Apply program to apply the transactions into target tables. This parameter can be tweaked around to increase the throughput and also to keep CPU usage low. The default value is 16 on LUW and usually good for medium volume replicated
environment.
=> memory_limit : This parameter determines the amount of memory that a Q-Apply program can use a buffer to process transactions from a receive queue. When this limit is reached, the messages are not retrieved from recieveQ until apply agents applies the transactions to replicated tables. If just a single
transaction is large enough to fill up the memory_limit, rows are processed in multiple sets of the same transaction. This parameter must be at least three times larger than max_message_size of sendQ.
=> monitor_interval : This parameter sets a frequency of a Q-apply inserting a row into ibmqrep_applymon table which is used for monitoring the success & failure of Q-apply program. Its default value is 30 seconds, which is usually too high and you can have some delayed frequency ,2 to 5 minutes depending on your environment, of inserting monitoring data into monitoring table.
=> prune_interval : It sets the frequency of a Q-apply program could prones the data from tables IBMQREP_APPLYMON,IBMQREP_APPLYTRACE. The default value is 5 minutes, if its reduced further that means pruning queries will be executed more frequently and will utilize the system resources more frequently.
Note:- Since it does not cover up pruning the table IBMQREP_DONEMSG (an internal table) which is used by Apply program to record all transaction and administrative messages that have received. Frequency of pruning this table is controlled by parameters "PRUNE_METHOD" & "PRUNE_BATCH_SIZE"
=> prune_batch_size : It signifies the number of rows that are to be deleted from ibmqrep_donemsg table in one commit scope when prune_method is 2. It does not hurt setting as 1000.
=> prune_method : This signifies the prune method for deleting the old data from recieve queue, querying the ibmqrep_donemsg table to verify if all the messages are applied, and then deletes the corresponding data from ibmqrep_donemsg table. The default value is "1" which means the pruning activity is fired after processing every single row and the other possible value is "2" which means the data would be deleted after processing "n" number of rows defined in prune_batch_size.
Maintenance on Q-rep database objects
Q-rep has own control tables which are actively used specially on target database where apply program runs. Their maintenance is important otherwise it impacts direly on performance.
=> Assign a dedicated bufferpool and monitor its hit ratio reguarly
=> Generally these control tables are defined as APPEND ON. So, any deletion on these tables leave the empty pages which are only claimed after the REORG is done. So, regular REORG is needed.
=> Apply the runstats on periodic basis as it impacts the internal queries which are executed by Q-rep for processing the rows or other administrative purposes.
Note:- This information is shared based on my knowledge and the experience.