Home Page for the TeradataForum
 

Archives of the TeradataForum

Message Posted: Wed, 17 Dec 2003 @ 21:45:02 GMT


     
  <Prev Next>   <<First <Prev
Next>
Last>>
 


Subj:   Re: System performance problem
 
From:   Clark, Dave

Ivan-

The list of actions you provided is good. At the GSC, we have a series of operations that can be executed either remotely or locally to augment your analysis. (It is primarily targeted to an MP-RAS platform, we have a similar list for Windows). In the MP-RAS environment, I also suggest you take a look at the script "/etc/gsc/bin/perflook.sh". This will provide a system-wide snapshot in a series of files. The GSC uses this data for incident analysis.


Hope this helps.


-dave.clark



V2 SLOWDOWN OR HANG CHECK LIST....

        bteq  - Can you logon to BTEQ or does it hang?
                Can you 'select time' or does it hang?
                Can you 'sel * from dbc.dbcinfo'?

        showlocks - Run showlocks from the supervisor screen to check
                  for Host Utility Locks.  This will not show
                  Transaction Locks.  If you find a host utility
                  lock that is causing the problem you can release
                  the lock through bteq with the following command:

                        release lock , override;

        showspace /l - Run showspace from the ferret utility which
                  is started  from the supervisor screen. The '/l'
                  parameter shows the detail for each logical disk.
                  If any disk shows 0 or 1 free cylinders in the
                  far left column this is your problem.

        qrysessn - Run qrysessn from the supervisor screen. If you
                 find any  blocked sessions it will show the
                 database and tablename on  which it is blocked.
                 You may have a deadlock condition.  Teradata
                 Manager (In V2R2 Teradata Manager will show detail
                 on transaction locks. Check this for a deadlock
                 condition.  Route the output to a UNIX file
                 (qrysessn.out) for analysis:

                 grep '^ State Details' /tmp/b |wc  # sessions
                 grep '^ State Details' /tmp/b |grep IDLE|wc # IDLE
                 grep '^ State Details' /tmp/b |grep ACTIVE|wc  # ACTIVE
                 grep '^   52' qrysessn.out|wc # of lan connections
                 grep '^ 1022' qrysessn.out|wc # of MVS connections

        ps -ef | grep ' R ' |wc  - Who is "runnable" (awaiting CPU)

        Streams Logs - Check the streams logs on all nodes to see
                 what transpired right before the hang condition.
                 Things to look for:

                     Cylpacking could indicate full disks.

                     Many 'BNS PagePool' messages could indicate running

                 out of UNIX memory.

                 2631 Deadlock timeouts could indicate a deadlocking
                 problem.

                 Any error messages should be checked out for a
                 possible problem.

        /etc/.osm  - Problems with saving dumps can cause performance
                     problems. See TTIP 1031.

        /var/adm/usererr -  Problems with saving dumps can cause
                 performance problems. See TTIP 1031.

        vprocmanager - Execute '/tpasw/bin/vprocmanager' then type in
                 'status' and 'enter'.  Verify that all processors
                 are online and Logons are enabled.  Verify that there
                 is not a message:

                                 'The system debugger is attached'.

                 Verify that the PDE State is TPA.

        sar 5 5 - Run sar on each node from the UNIX prompt. You can use

                'rallsh' if you have many nodes. If %wio is over 30
                performance will start to slow. If %wio is over 60 you
                may be running low on memory. If any nodes are showing
                high %idle and others are not, you want to try to find
                out what is going on on the nodes that are not idle. If
                you have high %wio on any node, it could possibly be
                waiting for disk, a lock, bynet or memory.

                        idle: not good if under heavy load
                        wio: okay - waiting for diskio to complete
                        sys: good, means PDE and kernel are busy
                        usr: very good, awt's getting cpu time

        sar -r 5 5 - Look at the number of memory pages under the freemem
               column. If you have less than 8000 pages check TA655 in
               lotus notes to verify you have the fsgcachepercent set
               properly. If you need assistance from the pde group to
               verify this, get the the lotsfree amount from the
               following command:

                                grep -i lotsfree /etc/conf/cf.d/*tune.

               Also get the fsgcachepercent from 'xctl', 'screen dbs'.

        sar -d 5 5 - Shows disk controller workload. Are any disk
               devices 100% busy over time? The system may be I/O bound
               rather than in a hang state.

        puma commands - The puma commands can be used to find a hot node
               or process. You want to look for the node or process that
               is different than the others. You can use rallsh to run
               these commands on all nodes, but be careful because some
               of them have quite a bit of output. You may want to run
               them on a single node until you get a feel for how the
               output looks. You want to run the commands 2 or 3 times
               to see if they remain the same.

                puma -m | grep Work flowcontrolling if 20 or more messages
               per mailbox

                puma -m |grep -v ' 0 ' display nonzero messages

                puma -M |grep -v ' 0 ' display nonzero monitors

                puma -c |grep -v ' 0 ' if msgworknew+msgworkone > 61
               we may have awt congestion

                puma -P displays number of tasks, mailboxes & monitors in use

                puma -D BSC/4 Third field should be less than 3E8 over
               several iterations. If it never gets down to zero may have
               a bynet problem.

        proc -ping - see if all nodes responding

        netecho - Checks to see which bynets are responding)

        bam -s - Verify that the bmca's and bya's for both bynets are
              online.  Use rallsh to check all nodes.

        blmstat -z; blmstat -v |grep BrdActive - If BrdActive shows
              more than 50% to 60%, then the platform can be
              considered broadcast-bound.

        dfspace - Any filesystems out of space?

      xctl -nw     - issue "screen debug" and verify that the following
              flags are "off":

                        (1) Break Stop: halts on restart

                  (2) Start with Debug:  halts on startup

        /etc/gsc/bin/top - determine the top 15 processes on the system
              and periodically  updates this information.  Top is
              in the GSCTOOLS package and is available on the patch
              server.


        If you are running from MVS or VM check the TDP. If you need
        assistance with this, contact the client group. TTIP 380 and
      TTIP 697 go into more detail on some of these commands.

        Logon to the TDP and run the following commands:

        Enable Test  - Shows more detail on the Display Session
          command.  Be certain to Disable Test when done.

        D Cells  - Shows the number of memory cells. If you see
          numbers in the GMAIN or WAITs column you may be out of
          memory on the host side. These numbers are accumulative
          since the last TDP restart. You may need assistance from
          the client group to find out if this is a problem.

        D Ses           - General session status info.)

        D Ses - Check status of a particular session.
          TTIP380 shows code definitions.

        D Ses Ending    - Check for sessions in ENDING state.

        D IFP           - IFP/PE status.

        D OUTQ          - Check to see if any queues are backed up.



Scripts:
----------------------------------------------------------------



disksum - what space is available on a per database basis..
-----------------------------------------------------------
REPLACE MACRO disksum AS
(SEL date, time;
 SEL databasename, SUM(currentperm), SUM(maxperm)
 FROM diskspace
 GROUP BY 1
 ORDER BY 1
 WITH sum(currentperm), sum(maxperm););


WHOWASON - within a given period, who was logged on..
example:  exec whowason (date-1, 000000, date, 123000);
-----------------------------------------------------------
REPLACE MACRO WhoWasOn
 (D1 date  DEFAULT DATE
 ,T1 float DEFAULT 120000
 ,D2 date  DEFAULT DATE
 ,T2 float DEFAULT 120000)
 AS
(
SEL a.Username(title'User')(format'x(12)')
   ,a.SessionNo(title'Session')(format'zzzzzz9')
   ,a.LogicalHostID(title'Host')
   ,a.IFPNo(title'IFP')
   ,a.LogDate(title'Logon')
   ,a.LogTime(title'')
   ,b.LogDate(title'Logoff')
   ,b.LogTime(title'')
 FROM DBC.LogOnOff a, DBC.LogOnOff b
 WHERE a.Event='Logon'
  AND  b.Event<>'Logon'
  AND  a.SessionNo=b.SessionNo
  AND  a.IFPNo=b.IFPNo
  AND  a.LogonDate=b.LogonDate
  AND  a.LogonTime=b.LogonTime
  AND ((a.LogDate=:D2 and a.LogTime<=:T2)
       OR a.LogDate<:D2)
  AND ((b.LogDate=:D1 and b.LogTime>=:T1)
       OR b.LogDate>:D1)
 ORDER BY a.LogonDate,a.LogonTime,a.SessionNo;
);


     
  <Prev Next>   <<First <Prev
Next>
Last>>
 
 
 
 
 
 
 
 
 
  
  Top Home Privacy Feedback  
 
 
Copyright for the TeradataForum (TDATA-L), Manta BlueSky    
Copyright 2016 - All Rights Reserved    
Last Modified: 15 Jun 2023