Article Original Creation Date: 2011-02-09
Overview
Assuming the scenario that a policy has first been run with a list of 1000 devices (last week), this run had a high success count. Then, this week a new list of 2000 devices have been used to replace the previous and the policy has been restarted and very high failure count.
Examining the policy logging in the UI, only 1583 devices are found that have been processed by the policy, in the Execution log part: 169 skipped, 469 failed, 1365 success.
However, when we click that line and look at the device list in the bottom part of the screen (details max lines is set to 2000), we only observe that 168 skipped, 50 failed, 1365 success.
Question 1
Why is there a discrepancy in the Failure and skipped count between overview and the details?
Answer 1
The agent must investigate the encore.log files, for policy ‘upgrade_60A040’,
We find 1583 lines having ‘Checking to see if this device has already been processed in this time period’ and 1583 lines having ‘Device has not yet been processed during this time period, so it is OK to process it.’
Question 2
Just below the occurrence of line 1583, why do we find an exception been thrown?
Answer 2
We find this as a very high coincidence and suspect that this is the cause for not having the complete list processed in the policy.
Environment
Solaris 10 Oracle 10 WL 9.2 MP1 Tomcat 5.5.25
Resolution
- The recommended procedure is to do 6000 FWU an hour, if it stops at 1500, we have a high impact.
- The configured amount of Rate Target/Concurrency limit of the Timer Driven Policy was 2500/2500.
The errors shown in the Encore.log are:
2011-02-08 02:10:55,032 ERROR encore.inventory - Transaction BEA1-40634011A64244F5004B not active anymore. tx status = Marked rollback. [Reason=weblogic.transaction.internal.TimedOutException: Transaction timed out after 601 seconds
- This type of issue points to a temporary DB performance problem, advise the customer in these scenarios to change the Rate Target/Concurrency limit to 1000/1000.
- The next run with these values process OK.
- Regarding the devices count discrepancy, why not all devices are counted right in the detailed overview even though the difference 2000-1964=36 plus 37 is exact the 73 failure count.
- This is because the detailed log shows the real count out of the Encore.logs and the overview seems to calculate with the configured count.
Priyanka Bhotika
Comments