Archives of the TeradataForum
Message Posted: Wed, 17 Dec 2003 @ 21:45:02 GMT
| Subj: || || Re: System performance problem |
| From: || || Clark, Dave |
The list of actions you provided is good. At the GSC, we have a series of operations that can be executed either remotely or locally to
augment your analysis. (It is primarily targeted to an MP-RAS platform, we have a similar list for Windows). In the MP-RAS environment, I
also suggest you take a look at the script "/etc/gsc/bin/perflook.sh". This will provide a system-wide snapshot in a series of files. The
GSC uses this data for incident analysis.
Hope this helps.
V2 SLOWDOWN OR HANG CHECK LIST....
bteq - Can you logon to BTEQ or does it hang?
Can you 'select time' or does it hang?
Can you 'sel * from dbc.dbcinfo'?
showlocks - Run showlocks from the supervisor screen to check
for Host Utility Locks. This will not show
Transaction Locks. If you find a host utility
lock that is causing the problem you can release
the lock through bteq with the following command:
release lock , override;
showspace /l - Run showspace from the ferret utility which
is started from the supervisor screen. The '/l'
parameter shows the detail for each logical disk.
If any disk shows 0 or 1 free cylinders in the
far left column this is your problem.
qrysessn - Run qrysessn from the supervisor screen. If you
find any blocked sessions it will show the
database and tablename on which it is blocked.
You may have a deadlock condition. Teradata
Manager (In V2R2 Teradata Manager will show detail
on transaction locks. Check this for a deadlock
condition. Route the output to a UNIX file
(qrysessn.out) for analysis:
grep '^ State Details' /tmp/b |wc # sessions
grep '^ State Details' /tmp/b |grep IDLE|wc # IDLE
grep '^ State Details' /tmp/b |grep ACTIVE|wc # ACTIVE
grep '^ 52' qrysessn.out|wc # of lan connections
grep '^ 1022' qrysessn.out|wc # of MVS connections
ps -ef | grep ' R ' |wc - Who is "runnable" (awaiting CPU)
Streams Logs - Check the streams logs on all nodes to see
what transpired right before the hang condition.
Things to look for:
Cylpacking could indicate full disks.
Many 'BNS PagePool' messages could indicate running
out of UNIX memory.
2631 Deadlock timeouts could indicate a deadlocking
Any error messages should be checked out for a
/etc/.osm - Problems with saving dumps can cause performance
problems. See TTIP 1031.
/var/adm/usererr - Problems with saving dumps can cause
performance problems. See TTIP 1031.
vprocmanager - Execute '/tpasw/bin/vprocmanager' then type in
'status' and 'enter'. Verify that all processors
are online and Logons are enabled. Verify that there
is not a message:
'The system debugger is attached'.
Verify that the PDE State is TPA.
sar 5 5 - Run sar on each node from the UNIX prompt. You can use
'rallsh' if you have many nodes. If %wio is over 30
performance will start to slow. If %wio is over 60 you
may be running low on memory. If any nodes are showing
high %idle and others are not, you want to try to find
out what is going on on the nodes that are not idle. If
you have high %wio on any node, it could possibly be
waiting for disk, a lock, bynet or memory.
idle: not good if under heavy load
wio: okay - waiting for diskio to complete
sys: good, means PDE and kernel are busy
usr: very good, awt's getting cpu time
sar -r 5 5 - Look at the number of memory pages under the freemem
column. If you have less than 8000 pages check TA655 in
lotus notes to verify you have the fsgcachepercent set
properly. If you need assistance from the pde group to
verify this, get the the lotsfree amount from the
grep -i lotsfree /etc/conf/cf.d/*tune.
Also get the fsgcachepercent from 'xctl', 'screen dbs'.
sar -d 5 5 - Shows disk controller workload. Are any disk
devices 100% busy over time? The system may be I/O bound
rather than in a hang state.
puma commands - The puma commands can be used to find a hot node
or process. You want to look for the node or process that
is different than the others. You can use rallsh to run
these commands on all nodes, but be careful because some
of them have quite a bit of output. You may want to run
them on a single node until you get a feel for how the
output looks. You want to run the commands 2 or 3 times
to see if they remain the same.
puma -m | grep Work flowcontrolling if 20 or more messages
puma -m |grep -v ' 0 ' display nonzero messages
puma -M |grep -v ' 0 ' display nonzero monitors
puma -c |grep -v ' 0 ' if msgworknew+msgworkone > 61
we may have awt congestion
puma -P displays number of tasks, mailboxes & monitors in use
puma -D BSC/4 Third field should be less than 3E8 over
several iterations. If it never gets down to zero may have
a bynet problem.
proc -ping - see if all nodes responding
netecho - Checks to see which bynets are responding)
bam -s - Verify that the bmca's and bya's for both bynets are
online. Use rallsh to check all nodes.
blmstat -z; blmstat -v |grep BrdActive - If BrdActive shows
more than 50% to 60%, then the platform can be
dfspace - Any filesystems out of space?
xctl -nw - issue "screen debug" and verify that the following
flags are "off":
(1) Break Stop: halts on restart
(2) Start with Debug: halts on startup
/etc/gsc/bin/top - determine the top 15 processes on the system
and periodically updates this information. Top is
in the GSCTOOLS package and is available on the patch
If you are running from MVS or VM check the TDP. If you need
assistance with this, contact the client group. TTIP 380 and
TTIP 697 go into more detail on some of these commands.
Logon to the TDP and run the following commands:
Enable Test - Shows more detail on the Display Session
command. Be certain to Disable Test when done.
D Cells - Shows the number of memory cells. If you see
numbers in the GMAIN or WAITs column you may be out of
memory on the host side. These numbers are accumulative
since the last TDP restart. You may need assistance from
the client group to find out if this is a problem.
D Ses - General session status info.)
D Ses - Check status of a particular session.
TTIP380 shows code definitions.
D Ses Ending - Check for sessions in ENDING state.
D IFP - IFP/PE status.
D OUTQ - Check to see if any queues are backed up.
disksum - what space is available on a per database basis..
REPLACE MACRO disksum AS
(SEL date, time;
SEL databasename, SUM(currentperm), SUM(maxperm)
GROUP BY 1
ORDER BY 1
WITH sum(currentperm), sum(maxperm););
WHOWASON - within a given period, who was logged on..
example: exec whowason (date-1, 000000, date, 123000);
REPLACE MACRO WhoWasOn
(D1 date DEFAULT DATE
,T1 float DEFAULT 120000
,D2 date DEFAULT DATE
,T2 float DEFAULT 120000)
FROM DBC.LogOnOff a, DBC.LogOnOff b
AND ((a.LogDate=:D2 and a.LogTime<=:T2)
AND ((b.LogDate=:D1 and b.LogTime>=:T1)
ORDER BY a.LogonDate,a.LogonTime,a.SessionNo;