Re: System performance problem [From Clark, Dave: Wed, 17 Dec 2003 @ 21:45 GMT]

Message Posted: Wed, 17 Dec 2003 @ 21:45:02 GMT

Subj:

Re: System performance problem

From:

Clark, Dave

Ivan-

The list of actions you provided is good. At the GSC, we have a series of operations that can be executed either remotely or locally to augment your analysis. (It is primarily targeted to an MP-RAS platform, we have a similar list for Windows). In the MP-RAS environment, I also suggest you take a look at the script "/etc/gsc/bin/perflook.sh". This will provide a system-wide snapshot in a series of files. The GSC uses this data for incident analysis.

Hope this helps.

-dave.clark

V2 SLOWDOWN OR HANG CHECK LIST....

bteq - Can you logon to BTEQ or does it hang? Can you 'select time' or does it hang? Can you 'sel * from dbc.dbcinfo'? showlocks - Run showlocks from the supervisor screen to check for Host Utility Locks. This will not show Transaction Locks. If you find a host utility lock that is causing the problem you can release the lock through bteq with the following command: release lock , override; showspace /l - Run showspace from the ferret utility which is started from the supervisor screen. The '/l' parameter shows the detail for each logical disk. If any disk shows 0 or 1 free cylinders in the far left column this is your problem. qrysessn - Run qrysessn from the supervisor screen. If you find any blocked sessions it will show the database and tablename on which it is blocked. You may have a deadlock condition. Teradata Manager (In V2R2 Teradata Manager will show detail on transaction locks. Check this for a deadlock condition. Route the output to a UNIX file (qrysessn.out) for analysis: grep '^ State Details' /tmp/b |wc # sessions grep '^ State Details' /tmp/b |grep IDLE|wc # IDLE grep '^ State Details' /tmp/b |grep ACTIVE|wc # ACTIVE grep '^ 52' qrysessn.out|wc # of lan connections grep '^ 1022' qrysessn.out|wc # of MVS connections ps -ef | grep ' R ' |wc - Who is "runnable" (awaiting CPU) Streams Logs - Check the streams logs on all nodes to see what transpired right before the hang condition. Things to look for: Cylpacking could indicate full disks. Many 'BNS PagePool' messages could indicate running out of UNIX memory. 2631 Deadlock timeouts could indicate a deadlocking problem. Any error messages should be checked out for a possible problem. /etc/.osm - Problems with saving dumps can cause performance problems. See TTIP 1031. /var/adm/usererr - Problems with saving dumps can cause performance problems. See TTIP 1031. vprocmanager - Execute '/tpasw/bin/vprocmanager' then type in 'status' and 'enter'. Verify that all processors are online and Logons are enabled. Verify that there is not a message: 'The system debugger is attached'. Verify that the PDE State is TPA. sar 5 5 - Run sar on each node from the UNIX prompt. You can use 'rallsh' if you have many nodes. If %wio is over 30 performance will start to slow. If %wio is over 60 you may be running low on memory. If any nodes are showing high %idle and others are not, you want to try to find out what is going on on the nodes that are not idle. If you have high %wio on any node, it could possibly be waiting for disk, a lock, bynet or memory. idle: not good if under heavy load wio: okay - waiting for diskio to complete sys: good, means PDE and kernel are busy usr: very good, awt's getting cpu time sar -r 5 5 - Look at the number of memory pages under the freemem column. If you have less than 8000 pages check TA655 in lotus notes to verify you have the fsgcachepercent set properly. If you need assistance from the pde group to verify this, get the the lotsfree amount from the following command: grep -i lotsfree /etc/conf/cf.d/*tune. Also get the fsgcachepercent from 'xctl', 'screen dbs'. sar -d 5 5 - Shows disk controller workload. Are any disk devices 100% busy over time? The system may be I/O bound rather than in a hang state. puma commands - The puma commands can be used to find a hot node or process. You want to look for the node or process that is different than the others. You can use rallsh to run these commands on all nodes, but be careful because some of them have quite a bit of output. You may want to run them on a single node until you get a feel for how the output looks. You want to run the commands 2 or 3 times to see if they remain the same. puma -m | grep Work flowcontrolling if 20 or more messages per mailbox puma -m |grep -v ' 0 ' display nonzero messages puma -M |grep -v ' 0 ' display nonzero monitors puma -c |grep -v ' 0 ' if msgworknew+msgworkone > 61 we may have awt congestion puma -P displays number of tasks, mailboxes & monitors in use puma -D BSC/4 Third field should be less than 3E8 over several iterations. If it never gets down to zero may have a bynet problem. proc -ping - see if all nodes responding netecho - Checks to see which bynets are responding) bam -s - Verify that the bmca's and bya's for both bynets are online. Use rallsh to check all nodes. blmstat -z; blmstat -v |grep BrdActive - If BrdActive shows more than 50% to 60%, then the platform can be considered broadcast-bound. dfspace - Any filesystems out of space? xctl -nw - issue "screen debug" and verify that the following flags are "off": (1) Break Stop: halts on restart (2) Start with Debug: halts on startup /etc/gsc/bin/top - determine the top 15 processes on the system and periodically updates this information. Top is in the GSCTOOLS package and is available on the patch server. If you are running from MVS or VM check the TDP. If you need assistance with this, contact the client group. TTIP 380 and TTIP 697 go into more detail on some of these commands. Logon to the TDP and run the following commands: Enable Test - Shows more detail on the Display Session command. Be certain to Disable Test when done. D Cells - Shows the number of memory cells. If you see numbers in the GMAIN or WAITs column you may be out of memory on the host side. These numbers are accumulative since the last TDP restart. You may need assistance from the client group to find out if this is a problem. D Ses - General session status info.) D Ses - Check status of a particular session. TTIP380 shows code definitions. D Ses Ending - Check for sessions in ENDING state. D IFP - IFP/PE status. D OUTQ - Check to see if any queues are backed up. Scripts: ---------------------------------------------------------------- disksum - what space is available on a per database basis.. ----------------------------------------------------------- REPLACE MACRO disksum AS (SEL date, time; SEL databasename, SUM(currentperm), SUM(maxperm) FROM diskspace GROUP BY 1 ORDER BY 1 WITH sum(currentperm), sum(maxperm);); WHOWASON - within a given period, who was logged on.. example: exec whowason (date-1, 000000, date, 123000); ----------------------------------------------------------- REPLACE MACRO WhoWasOn (D1 date DEFAULT DATE ,T1 float DEFAULT 120000 ,D2 date DEFAULT DATE ,T2 float DEFAULT 120000) AS ( SEL a.Username(title'User')(format'x(12)') ,a.SessionNo(title'Session')(format'zzzzzz9') ,a.LogicalHostID(title'Host') ,a.IFPNo(title'IFP') ,a.LogDate(title'Logon') ,a.LogTime(title'') ,b.LogDate(title'Logoff') ,b.LogTime(title'') FROM DBC.LogOnOff a, DBC.LogOnOff b WHERE a.Event='Logon' AND b.Event<>'Logon' AND a.SessionNo=b.SessionNo AND a.IFPNo=b.IFPNo AND a.LogonDate=b.LogonDate AND a.LogonTime=b.LogonTime AND ((a.LogDate=:D2 and a.LogTime<=:T2) OR a.LogDate<:D2) AND ((b.LogDate=:D1 and b.LogTime>=:T1) OR b.LogDate>:D1) ORDER BY a.LogonDate,a.LogonTime,a.SessionNo; );

Attachments

Library

Quick Reference

Archives of the TeradataForum

Message Posted: Wed, 17 Dec 2003 @ 21:45:02 GMT