Solaris - Consommation de ressources

De UnixManiax
(diff) ← Version précédente | Voir la version actuelle (diff) | Version suivante → (diff)
Aller à la navigation Aller à la recherche


Surveillance disques

Surveiller utilisation des I/O disques

# iostat -xn 2 2
                    extended device statistics
    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
    0.1    3.1    9.1   23.5  0.0  0.0    0.0    3.0   0   0 c0d0
   34.1    0.0   17.1    0.0  0.0  0.6    0.0   16.9   0   3 c0d1
   34.4    0.0   17.2    0.0  0.0  0.6    0.0   16.6   0   3 c0d2
   [...]
  111.7   56.6 1599.6  990.2  0.0  4.7    0.0   27.8   0  97 vdc34
    0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0 vdc35
    0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0 vdc36
   [...]

Les colonnes intéressantes sont "%w" et "%b". "%b" indique le pourcentage d'utilisation du disque (ou de la LUN sur SAN). Ici on voit que le disque vdc34 est utilisé à 97%. Sur ce système, où je n'ai affiché ici que quelques LUNs, de nombreuses LUNs sont utilisées à 100% ou presque et nous ressentons de gros ralentissements. L'activité est trop forte et la baie n'est pas assez performante.

La colonne "%w" indique le pourcentage d'attente pour écrire sur le disque, donc concrètement, si "%w" est supérieur à "%b", c'est que la latence n'est pas sur le disque, mais sur le bus système.


Classer les processus par consommation d'I/O disque

Je n'ai pas trouvé de commande claire pour retourner cette info, mais j'ai trouvé un script perl "rusage" sur le net ici http://www.brendangregg.com/Solaris/prusage. Et un pdf de 45 pages traite en détail de ce sujet ici : http://www.brendangregg.com/Solaris/paper_diskubyp1.pdf.

Voici un exemple d'utilisation et juste après le code source, au cas où le lien ne serait plus accessible. Ici on classe les processus par consommation disque en écriture.

# ./prusage -i 2 2 -s pid
   PID  MINF  MAJF    INBLK    OUBLK   CHAR-kb COMM
 18085     0 1034624       11  2066703    783785 oracle
 18072     0 1004673       28  2006007   2922472 oracle
 14414     0  1517      316   926171    525708 tictimed
 11107     0  6626       13   777992   1193227 lp
 15910     0  1527      137   710801    633187 tictimed
 13808     0   964       93   707699    633186 tictimed
 14415     0  1188       68   695021    633219 tictimed
 14484     0  2700       69   694253    633180 tictimed
 13647     0  2211      143   687310    633193 tictimed
 15098     0  1186       92   674124    633187 tictimed
 13812     0  1268      172   673283    633219 tictimed
 13898     0  1042       49   673332    633200 tictimed
 12612     0   945       79   671873    633160 tictimed

Et le code source :

#!/usr/bin/perl
#
# prusage - Process usage stats, Solaris. I/O, sys/usr times, context switches.
#           A supplement to "ps", can be run as any user.
#
# 01-Jul-2005, ver 1.00  (check for newer vers, http://www.brendangregg.com)
#
#
# USAGE: prusage [-bchinuwxCT] [-p PID] [-s sort] [-t top] [interval] [count]
#
#      prusage               # Default. (-ic 1), fit to screen, 1 secs.
#      prusage -b            # Child times report (must be root or owner)
#      prusage -i            # I/O stats (default)
#      prusage -u            # USR/SYS times
#      prusage -x            # Context Switchs
#      prusage -w            # Wide output
#      prusage -c            # Clear the screen (default)
#      prusage -C            # Don't clear the screen
#      prusage -T            # Don't fit to screen (print all lines)
#      prusage -p pid        # Print this PID only
#      prusage -s sort       # Sort on pid,blks,cpu,utime,inblk,vctx,...
#      prusage -t lines      # Print top lines only
#  eg,
#      prusage 2             # 2 second samples (first is historical)
#      prusage 2 5           # 5 x 2 second samples
#      prusage -xi 2         # I/O and Context switch reports, 2 secs
#      prusage -biux 10      # multi output, all reports every 10 secs
#      prusage -C 10         # 10 second samples, no clear screen
#      prusage -CT 10        # 10 second samples, all lines
#      prusage -Ct8 10 5     # 5 x 10 second samples, top 8 lines only
#      prusage -p 11321      # PID 11321 only
#      prusage -s pid        # sort on PID
#
# FIELDS:
#              PID     Process ID
#              MINF    Minor Page Faults (satisfied from RAM)
#              MAJF    Major Page Faults (satisfied by disk I/O)
#              INBLK   In Blocks (disk I/O reads)
#              OUBLK   Out Blocks (disk I/O writes)
#              CHAR-kb Character I/O Kbytes
#              COMM    Command name
#              USR     User Time
#              SYS     System Time
#              CUSR    Child User Time
#              CSYS    Child System Time
#              WAIT    Wait for CPU Time
#              LOCK    User waiting on lock time
#              TRAP    System trap time
#              VCTX    Voluntary Context Switches (I/O bound)
#              ICTX    Involuntary Context Switches (CPU bound)
#              SYSC    System calls
#
# NOTE: Minor faults always report zero on most versions of Solaris.
#
# REFERENCE: /usr/include/sys/procfs.h
#
# SEE ALSO: psio                               # process I/O
#          prstat -m                           # USR/SYS times, ...
#           /usr/ucb/rusage                    # historical
#
# COPYRIGHT: Copyright (c) 2004, 2005 Brendan Gregg.
#
#  This program is free software; you can redistribute it and/or
#  modify it under the terms of the GNU General Public License
#  as published by the Free Software Foundation; either version 2
#  of the License, or (at your option) any later version.
#
#  This program is distributed in the hope that it will be useful,
#  but WITHOUT ANY WARRANTY; without even the implied warranty of
#  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
#  GNU General Public License for more details.
#
#  You should have received a copy of the GNU General Public License
#  along with this program; if not, write to the Free Software Foundation,
#  Inc., 59 Temple Place - Suite 330, Boston, MA  02111-1307, USA.
#
#  (http://www.gnu.org/copyleft/gpl.html)
#
# Author: Brendan Gregg  [Sydney, Australia]
#
# 31-Aug-2004  Brendan Gregg   Created this.
# 12-Mar-2005     "      "     Processed /proc/*/psinfo as well.
# 09-May-2005     "      "     Processed /proc/*/usage as well.

use Getopt::Std;

#
# --- Default Variables ---
#
$INTERVAL = 1;         # seconds to sample
$MAX = 2**32;          # max count of samples
$NEW = 0;              # skip summary output (new data only)
$WIDE = 0;             # print wide output (don't truncate)
$SCHED = 0;            # print PID 0
$TOP = 0;              # print top many only
$FIT = 1;              # fit to screen
$CLEAR = 1;            # clear screen before outputs
$STYLE_IO = 1;         # default output style, I/O
$STYLE_CTX = 0;                # output style, Context Switches
$STYLE_TIME = 0;       # output style, Times
$STYLE_CHILD = 0;      # output style, Child times
$MULTI = 0;            # multi reports, multiple styles
$TARGET_PID = -1;      # target PID, -1 means all
$count = 1;            # current iteration

#
# --- Command Line Arguments ---
#

### Check usage
&Usage() if $ARGV[0] eq "--help";
getopts('bchinuwxp:s:t:CT') || &Usage();
&Usage() if $opt_h;

### Process options
$NEW = 1 if $opt_n;
$WIDE = 1 if $opt_w;
$FIT = 0 if $opt_T;
$CLEAR = 0 if $opt_C;
$STYLE_IO = 0 if $opt_x || $opt_u || $opt_b;
$STYLE_CTX = 1 if $opt_x;
$STYLE_TIME = 1 if $opt_u;
$STYLE_CHILD = 1 if $opt_b;
$STYLE_IO = 1 if $opt_i;
$TOP = $opt_t if defined $opt_t;
$SORT = $opt_s if defined $opt_s;
$TARGET_PID = $opt_p if defined $opt_p;
$INTERVAL = shift(@ARGV) || $INTERVAL;
$MAX = shift(@ARGV) || $MAX;

### Determine style count
$STYLES = $STYLE_IO + $STYLE_CTX + $STYLE_TIME + $STYLE_CHILD;
$MULTI = 1 if $STYLES > 1;

### Determine clear seq
$CLEARSTR = `clear` if $CLEAR;

### Fit to screen
if ($FIT && ! $opt_t) {
       my ($row,$col) = &getwinsz();
       $TOP = int(($row - $STYLES * 2) / $STYLES);
}


#
# --- Main ---
#
for (;$count <= $MAX; $count++) {

       ### Get data
       &GetProcStat();         # fetch and save /proc stats in %PID{$pid}

       next if $NEW && $count == 1;

       ### Print data
       print $CLEARSTR if $CLEAR;
       &PrintIO($SORT) if $STYLE_IO;
       &PrintCtx($SORT) if $STYLE_CTX;
       &PrintTime($SORT) if $STYLE_TIME;
       &PrintChild($SORT) if $STYLE_CHILD;

       ### Pause
       sleep($INTERVAL) unless $count == $MAX;

       ### Cleanup memory
       undef %PID;
       undef %Comm;
}


#
# --- Subroutines ---
#

# GetProcStat - Gets /proc usage statistics and saves them in %PID.
#      This can be run multiple times, the first time %PID will be
#      populated with the summary since boot values.
#      This reads /proc/*/usage and /proc/*/psinfo.
#
sub GetProcStat {
   my $pid;
   chdir "/proc";

   foreach $pid (sort {$a<=>$b} <*>) {
       next if $pid == $$;
       next if $pid == 0 && $SCHED == 0;
       next if $TARGET_PID > -1 && $pid != $TARGET_PID;

       #
       #  struct prusage
       #

       ### Read usage stats
       open(USAGE,"/proc/$pid/usage") || next;
       read(USAGE,$usage,256);
       close USAGE;

       ### Unpack usage values
       ($pr_lwpid, $pr_count, $pr_tstamp, $pr_create, $pr_term,
        $pr_rtime, $pr_utime, $pr_stime, $pr_ttime, $pr_tftime,
        $pr_dftime, $pr_kftime, $pr_ltime, $pr_slptime, $pr_wtime,
        $pr_stoptime, $filltime, $pr_minf, $pr_majf, $pr_nswap,
        $pr_inblk, $pr_oublk, $pr_msnd, $pr_mrcv, $pr_sigs,
        $pr_vctx, $pr_ictx, $pr_sysc, $pr_ioch, $filler) =
        unpack("iia8a8a8a8a8a8a8a8a8a8a8a8a8a8a48LLLLLLLLLLLLa40",$usage);

       ### Process usage values
       $New{$pid}{utime} = timestruct2int($pr_utime);
       $New{$pid}{stime} = timestruct2int($pr_stime);
       $New{$pid}{ttime} = timestruct2int($pr_ttime);
       $New{$pid}{ltime} = timestruct2int($pr_ltime);
       $New{$pid}{wtime} = timestruct2int($pr_wtime);
       $New{$pid}{slptime} = timestruct2int($pr_slptime);
       $New{$pid}{minf}  = $pr_minf;
       $New{$pid}{majf}  = $pr_majf;
       $New{$pid}{nswap} = $pr_nswap;
       $New{$pid}{inblk} = $pr_inblk;
       $New{$pid}{oublk} = $pr_oublk;
       $New{$pid}{vctx}  = $pr_vctx;
       $New{$pid}{ictx}  = $pr_ictx;
       $New{$pid}{sysc}  = $pr_sysc;
       $New{$pid}{ioch}  = $pr_ioch;
       # and a couple of my own,
       $New{$pid}{blks}  = $pr_inblk + $pr_oublk;
       $New{$pid}{ctxs}  = $pr_vctx + $pr_ictx;
       $New{$pid}{cpu}  = $New{$pid}{utime} + $New{$pid}{stime};

       #
       #  struct psinfo
       #

       ### Read psinfo stats
       open(PSINFO,"/proc/$pid/psinfo") || next;
       read(PSINFO,$psinfo,256);
       close PSINFO;

       ### Unpack psinfo values
       ($pr_flag, $pr_nlwp, $pr_pid, $pr_ppid, $pr_pgid, $pr_sid,
        $pr_uid, $pr_euid, $pr_gid, $pr_egid, $pr_addr, $pr_size,
        $pr_rssize, $pr_pad1, $pr_ttydev, $pr_pctcpu, $pr_pctmem,
        $pr_start, $pr_time, $pr_ctime, $pr_fname, $pr_psargs,
        $pr_wstat, $pr_argc, $pr_argv, $pr_envp, $pr_dmodel,
        $pr_taskid, $pr_projid, $pr_nzomb, $filler) =
        unpack("iiiiiiiiiiIiiiiSSa8a8a8Z16Z80iiIIaa3iiia",$psinfo);

        ### Save command name
        $Comm{$pid} = $pr_fname;

       next unless $STYLE_CHILD;       # only child needs the following,

       #
       #  struct pstatus
       #

       ### Read pstatus stats
       open(PSTATUS,"/proc/$pid/status") || next;
       read(PSTATUS,$pstatus,128);
       close PSTATUS;

       ### Unpack pstatus values
       ($pr_flags, $pr_nlwp, $pr_pid, $pr_ppid, $pr_pgid, $pr_sid,
        $pr_aslwpid, $pr_agentid, $pr_sigpend, $pr_brkbase, $pr_brksize,
        $pr_stkbase, $pr_stksize, $pr_utime, $pr_stime, $pr_cutime,
        $pr_cstime, $filler) =
        unpack("iiiiiiiia16iiiia8a8a8a8a",$pstatus);

       ### Process pstatus values
       $New{$pid}{cutime} = timestruct2int($pr_cutime);
       $New{$pid}{cstime} = timestruct2int($pr_cstime);
       $New{$pid}{ccpu}  = $New{$pid}{cutime} + $New{$pid}{cstime};
   }

   ### Cleanup memory
   foreach $pid (keys %New) {
       # save PID values,
       foreach $key (keys %{$New{$pid}}) {
               $PID{$pid}{$key} = $New{$pid}{$key} - $Old{$pid}{$key};
       }
   }
   undef %Old;
   foreach $pid (keys %New) {
       # save old values,
       foreach $key (keys %{$New{$pid}}) {
               $Old{$pid}{$key} = $New{$pid}{$key};
       }
   }
}

# PrintIO - print a report on I/O statistics: minf, majf, inblk, oublk, ioch.
#
sub PrintIO {
       my $sort = shift || "blks";
       my $top = $TOP;
       my $pid;

       ### Print header
       printf("%6s %5s %5s %8s %8s %9s %s\n","PID",
        "MINF","MAJF","INBLK","OUBLK","CHAR-kb","COMM");

       ### Print report
       foreach $pid (&SortPID("$sort")) {
               printf("%6s %5s %5s %8s %8s %9.0f %s\n",$pid,
                $PID{$pid}{minf},$PID{$pid}{majf},$PID{$pid}{inblk},
                $PID{$pid}{oublk},$PID{$pid}{ioch}/1024,
                trunc($Comm{$pid},33));
               last if --$top == 0;
       }
       print "\n" if $MULTI;
}

# PrintTime - print a report on Times: utime, stime, wtime, ltime, ttime.
#
sub PrintTime {
       my $sort = shift || "cpu";
       my $top = $TOP;
       my $pid;

       ### Print header
       printf("%6s %8s %8s %8s %6s %6s %s\n","PID",
        "USR","SYS","WAIT","LOCK","TRAP","COMM");

       ### Print report
       foreach $pid (&SortPID("$sort")) {
               printf("%6s %8.2f %8.2f %8.2f %6.2f %6.2f %s\n",$pid,
                $PID{$pid}{utime},$PID{$pid}{stime},$PID{$pid}{wtime},
                $PID{$pid}{ltime},$PID{$pid}{ttime},trunc($Comm{$pid},32));
               last if --$top == 0;
       }
       print "\n" if $MULTI;
}

# PrintCtx - print a report on Context Swithes: utime, stime, vctx, ictx, sysc.
#
sub PrintCtx {
       my $sort = shift || "ctxs";
       my $top = $TOP;
       my $pid;

       ### Print header
       printf("%6s %7s %7s %9s %8s %10s %s\n","PID",
        "USR","SYS","VCTX","ICTX","SYSC","COMM");

       ### Print report
       foreach $pid (&SortPID("$sort")) {
               printf("%6s %7.2f %7.2f %9s %8s %10s %s\n",$pid,
                $PID{$pid}{utime},$PID{$pid}{stime},$PID{$pid}{vctx},
                $PID{$pid}{ictx},$PID{$pid}{sysc},trunc($Comm{$pid},27));
               last if --$top == 0;
       }
       print "\n" if $MULTI;
}

# PrintChild - print a report on Times: utime, stime, wtime, ltime, ttime.
#
sub PrintChild {
       my $sort = shift || "ccpu";
       my $top = $TOP;
       my $pid;

       ### Print header
       printf("%6s %8s %8s %8s %8s %s\n","PID",
        "USR","SYS","CUSR","CSYS","COMM");

       ### Print report
       foreach $pid (&SortPID("$sort")) {
               printf("%6s %8.2f %8.2f %8.2f %8.2f %s\n",$pid,
                $PID{$pid}{utime},$PID{$pid}{stime},$PID{$pid}{cutime},
                $PID{$pid}{cstime},trunc($Comm{$pid},32));
               last if --$top == 0;
       }
       print "\n" if $MULTI;
}

# SortPID - sorts the PID hash by the key given as arg1, returning a sorted
#      array of PIDs.
#
sub SortPID {
       my $sort = shift;

       ### Sort numerically
       if ($sort eq "pid") {
               return sort {$a <=> $b} (keys %PID);
       } else {
               return sort {$PID{$b}{$sort} <=> $PID{$a}{$sort}} (keys %PID);
       }
}

# getwinsz - gets the terminal window size and returns it as x, y.
#      The default size returned is 24x80 if an error is encountered.
#
sub getwinsz {
       my $row = 24;
       my $col = 80;
       my ($xpix,$ypix,$winsize);
       my $TIOCGWINSZ = 21608;         # check /usr/include/sys/termios.h

       open(TTY, "+</dev/tty") || return($row,$col);
       ioctl(TTY, $TIOCGWINSZ, $winsize=) || return($row,$col);
       ($row, $col, $xpix, $ypix) = unpack('S4', $winsize);
       return($row,$col);
}

# timestruct2int - Convert a timestruct value (64 bits) into an integer
#      of seconds.
#
sub timestruct2int {
       my $timestruct = shift;
       my ($secs,$nsecs,$time);

       $secs = $nsecs = $time = 0;
       ($secs,$nsecs) = unpack("LL",$timestruct);
       $time = $secs + $nsecs * 10**-9;
       return $time;
}

# trunc - Returns a truncated string if required.
#
sub trunc {
       my $string = shift;
       my $length = shift;

       if ($WIDE) {
               return $string;
       } else {
               return substr($string,0,$length);
       }
}

# Usage - print usage message and exit.
#
sub Usage {
       print STDERR <<END;
prusage ver 0.97
USAGE: prusage [-chinuwx] [-p PID] [-s sort] [-t top] [interval] [count]

      prusage               # Default. (-ic 1), fit to screen, 1 secs.
      prusage -b            # Child times report (must be root or owner)
      prusage -i            # I/O stats (default)
      prusage -u            # USR/SYS times
      prusage -x            # Context Switchs
      prusage -w            # Wide output
      prusage -c            # Clear the screen (default)
      prusage -C            # Don't clear the screen
      prusage -T            # Don't fit to screen (print all lines)
      prusage -p pid        # Print this PID only
      prusage -s sort       # Sort on pid,blks,cpu,utime,inblk,vctx,...
      prusage -t lines      # Print top lines only
   eg,
      prusage 2             # 2 second samples (first is historical)
      prusage 2 5           # 5 x 2 second samples
      prusage -xi 2         # I/O and Context switch reports, 2 secs
      prusage -biux 10      # multi output, all reports every 10 secs
      prusage -C 10         # 10 second samples, no clear screen
      prusage -CT 10        # 10 second samples, all lines
      prusage -Ct8 10 5     # 5 x 10 second samples, top 8 lines only
      prusage -p 11321      # PID 11321 only
      prusage -s pid        # sort on PID
END
       exit;
}