Re: Abort Session from BTEQ [From Thomas F. Stanek: Mon, 12 Mar 2001 @ 17:37 GMT]

Message Posted: Mon, 12 Mar 2001 @ 17:37:11 GMT

Subj:

Re: Abort Session from BTEQ

From:

Thomas F. Stanek

Something things to consider with respect to rollbacks.

There really is nothing you can do to cancel a rollback once it starts, but the Priority Scheduler gives you a few options that might lessen the impact.

First, there is a UNIX parameter that allows a rollback to either execute in the Rush performance group of the DEFAULT partition or in the performance group from which it was executed. The default is the Rush performance group option. This is why rollbacks are especially painful. They get the highest system priority. If you are using PSF (Priority Scheduler Facility) you can lessen the impact by changing this UNIX parameter.

Second, if you can identify the offending query before the rollback starts (not always possible) because it is slowing the system or the transient journal is filling up, you can lower the query's priority before aborting it and initiating the rollback. What we done successfully is create a performance group with a policy of ABSolute and a weighting of 10%. We move problem queries into this performance group (via PMON or a PM/API enabled custom app) and then abort the session. The rolloback will certainly take a long time, but at least the impact on the rest of the system is minimized and controlled.

If a rollback has already started in a user performance group (not the Rush-DEFAULT scenario), another alternative to getting it under control could be to run a script that redefines the performance group in which it is running to ABSolute with a weighting of 10%. The down side is that any other user using the same performance group would incur the same limits on resources. Of course, you could temporarily change the performance group assignements for those users until the rollback si completed. Not pretty, but probably better than negatively impacting the rest of the user community, the batch cycle, etc. for hours or days.

Another suggestion is to create a monitor that looks at the transient journal space that runs at some regular interval, perhaps every 30 minutes. If it exceeds some threshhold, someone should be alerted so as to identify a potential problem early on.

We have used these types of techniques on two very large production systems and the rollback problem has become almost a non-issue.

This certainly doesn't solve the problem of not being able to cancel a rollback, but it is the best way I know to manage the overall system environment when one of these problems arise.

Regards,

Thomas F. Stanek
TFS Consulting

Attachments

Library

Quick Reference

Archives of the TeradataForum

Message Posted: Mon, 12 Mar 2001 @ 17:37:11 GMT