Solaris - Consommation de ressources

Surveillance disques

Surveiller utilisation des I/O disques

# iostat -xn 2 2
                    extended device statistics
    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
    0.1    3.1    9.1   23.5  0.0  0.0    0.0    3.0   0   0 c0d0
   34.1    0.0   17.1    0.0  0.0  0.6    0.0   16.9   0   3 c0d1
   34.4    0.0   17.2    0.0  0.0  0.6    0.0   16.6   0   3 c0d2
  111.7   56.6 1599.6  990.2  0.0  4.7    0.0   27.8   0  97 vdc34
    0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0 vdc35
    0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0 vdc36

Les colonnes intéressantes sont "%w" et "%b". "%b" indique le pourcentage d'utilisation du disque (ou de la LUN sur SAN). Ici on voit que le disque vdc34 est utilisé à 97%. Sur ce système, où je n'ai affiché ici que quelques LUNs, de nombreuses LUNs sont utilisées à 100% ou presque et nous ressentons de gros ralentissements. L'activité est trop forte et la baie n'est pas assez performante.

La colonne "%w" indique le pourcentage d'attente pour écrire sur le disque, donc concrètement, si "%w" est supérieur à "%b", c'est que la latence n'est pas sur le disque, mais sur le bus système.

Classer les processus par consommation d'I/O disque

Je n'ai pas trouvé de commande claire pour retourner cette info, mais j'ai trouvé un script perl "rusage" sur le net ici Et un pdf de 45 pages traite en détail de ce sujet ici :

Voici un exemple d'utilisation et juste après le code source, au cas où le lien ne serait plus accessible. Ici on classe les processus par consommation disque en écriture.

# ./prusage -i 2 2 -s pid
 18085     0 1034624       11  2066703    783785 oracle
 18072     0 1004673       28  2006007   2922472 oracle
 14414     0  1517      316   926171    525708 tictimed
 11107     0  6626       13   777992   1193227 lp
 15910     0  1527      137   710801    633187 tictimed
 13808     0   964       93   707699    633186 tictimed
 14415     0  1188       68   695021    633219 tictimed
 14484     0  2700       69   694253    633180 tictimed
 13647     0  2211      143   687310    633193 tictimed
 15098     0  1186       92   674124    633187 tictimed
 13812     0  1268      172   673283    633219 tictimed
 13898     0  1042       49   673332    633200 tictimed
 12612     0   945       79   671873    633160 tictimed

Et le code source :

# prusage - Process usage stats, Solaris. I/O, sys/usr times, context switches.
#           A supplement to "ps", can be run as any user.
# 01-Jul-2005, ver 1.00  (check for newer vers,
# USAGE: prusage [-bchinuwxCT] [-p PID] [-s sort] [-t top] [interval] [count]
#      prusage               # Default. (-ic 1), fit to screen, 1 secs.
#      prusage -b            # Child times report (must be root or owner)
#      prusage -i            # I/O stats (default)
#      prusage -u            # USR/SYS times
#      prusage -x            # Context Switchs
#      prusage -w            # Wide output
#      prusage -c            # Clear the screen (default)
#      prusage -C            # Don't clear the screen
#      prusage -T            # Don't fit to screen (print all lines)
#      prusage -p pid        # Print this PID only
#      prusage -s sort       # Sort on pid,blks,cpu,utime,inblk,vctx,...
#      prusage -t lines      # Print top lines only
#  eg,
#      prusage 2             # 2 second samples (first is historical)
#      prusage 2 5           # 5 x 2 second samples
#      prusage -xi 2         # I/O and Context switch reports, 2 secs
#      prusage -biux 10      # multi output, all reports every 10 secs
#      prusage -C 10         # 10 second samples, no clear screen
#      prusage -CT 10        # 10 second samples, all lines
#      prusage -Ct8 10 5     # 5 x 10 second samples, top 8 lines only
#      prusage -p 11321      # PID 11321 only
#      prusage -s pid        # sort on PID
#              PID     Process ID
#              MINF    Minor Page Faults (satisfied from RAM)
#              MAJF    Major Page Faults (satisfied by disk I/O)
#              INBLK   In Blocks (disk I/O reads)
#              OUBLK   Out Blocks (disk I/O writes)
#              CHAR-kb Character I/O Kbytes
#              COMM    Command name
#              USR     User Time
#              SYS     System Time
#              CUSR    Child User Time
#              CSYS    Child System Time
#              WAIT    Wait for CPU Time
#              LOCK    User waiting on lock time
#              TRAP    System trap time
#              VCTX    Voluntary Context Switches (I/O bound)
#              ICTX    Involuntary Context Switches (CPU bound)
#              SYSC    System calls
# NOTE: Minor faults always report zero on most versions of Solaris.
# REFERENCE: /usr/include/sys/procfs.h
# SEE ALSO: psio                               # process I/O
#          prstat -m                           # USR/SYS times, ...
#           /usr/ucb/rusage                    # historical
# COPYRIGHT: Copyright (c) 2004, 2005 Brendan Gregg.
#  This program is free software; you can redistribute it and/or
#  modify it under the terms of the GNU General Public License
#  as published by the Free Software Foundation; either version 2
#  of the License, or (at your option) any later version.
#  This program is distributed in the hope that it will be useful,
#  but WITHOUT ANY WARRANTY; without even the implied warranty of
#  GNU General Public License for more details.
#  You should have received a copy of the GNU General Public License
#  along with this program; if not, write to the Free Software Foundation,
#  Inc., 59 Temple Place - Suite 330, Boston, MA  02111-1307, USA.
#  (
# Author: Brendan Gregg  [Sydney, Australia]
# 31-Aug-2004  Brendan Gregg   Created this.
# 12-Mar-2005     "      "     Processed /proc/*/psinfo as well.
# 09-May-2005     "      "     Processed /proc/*/usage as well.

use Getopt::Std;

# --- Default Variables ---
$INTERVAL = 1;         # seconds to sample
$MAX = 2**32;          # max count of samples
$NEW = 0;              # skip summary output (new data only)
$WIDE = 0;             # print wide output (don't truncate)
$SCHED = 0;            # print PID 0
$TOP = 0;              # print top many only
$FIT = 1;              # fit to screen
$CLEAR = 1;            # clear screen before outputs
$STYLE_IO = 1;         # default output style, I/O
$STYLE_CTX = 0;                # output style, Context Switches
$STYLE_TIME = 0;       # output style, Times
$STYLE_CHILD = 0;      # output style, Child times
$MULTI = 0;            # multi reports, multiple styles
$TARGET_PID = -1;      # target PID, -1 means all
$count = 1;            # current iteration

# --- Command Line Arguments ---

### Check usage
&Usage() if $ARGV[0] eq "--help";
getopts('bchinuwxp:s:t:CT') || &Usage();
&Usage() if $opt_h;

### Process options
$NEW = 1 if $opt_n;
$WIDE = 1 if $opt_w;
$FIT = 0 if $opt_T;
$CLEAR = 0 if $opt_C;
$STYLE_IO = 0 if $opt_x || $opt_u || $opt_b;
$STYLE_CTX = 1 if $opt_x;
$STYLE_TIME = 1 if $opt_u;
$STYLE_CHILD = 1 if $opt_b;
$STYLE_IO = 1 if $opt_i;
$TOP = $opt_t if defined $opt_t;
$SORT = $opt_s if defined $opt_s;
$TARGET_PID = $opt_p if defined $opt_p;
$MAX = shift(@ARGV) || $MAX;

### Determine style count
$MULTI = 1 if $STYLES > 1;

### Determine clear seq
$CLEARSTR = `clear` if $CLEAR;

### Fit to screen
if ($FIT && ! $opt_t) {
       my ($row,$col) = &getwinsz();
       $TOP = int(($row - $STYLES * 2) / $STYLES);

# --- Main ---
for (;$count <= $MAX; $count++) {

       ### Get data
       &GetProcStat();         # fetch and save /proc stats in %PID{$pid}

       next if $NEW && $count == 1;

       ### Print data
       print $CLEARSTR if $CLEAR;
       &PrintIO($SORT) if $STYLE_IO;
       &PrintCtx($SORT) if $STYLE_CTX;
       &PrintTime($SORT) if $STYLE_TIME;
       &PrintChild($SORT) if $STYLE_CHILD;

       ### Pause
       sleep($INTERVAL) unless $count == $MAX;

       ### Cleanup memory
       undef %PID;
       undef %Comm;

# --- Subroutines ---

# GetProcStat - Gets /proc usage statistics and saves them in %PID.
#      This can be run multiple times, the first time %PID will be
#      populated with the summary since boot values.
#      This reads /proc/*/usage and /proc/*/psinfo.
sub GetProcStat {
   my $pid;
   chdir "/proc";

   foreach $pid (sort {$a<=>$b} <*>) {
       next if $pid == $$;
       next if $pid == 0 && $SCHED == 0;
       next if $TARGET_PID > -1 && $pid != $TARGET_PID;

       #  struct prusage

       ### Read usage stats
       open(USAGE,"/proc/$pid/usage") || next;
       close USAGE;

       ### Unpack usage values
       ($pr_lwpid, $pr_count, $pr_tstamp, $pr_create, $pr_term,
        $pr_rtime, $pr_utime, $pr_stime, $pr_ttime, $pr_tftime,
        $pr_dftime, $pr_kftime, $pr_ltime, $pr_slptime, $pr_wtime,
        $pr_stoptime, $filltime, $pr_minf, $pr_majf, $pr_nswap,
        $pr_inblk, $pr_oublk, $pr_msnd, $pr_mrcv, $pr_sigs,
        $pr_vctx, $pr_ictx, $pr_sysc, $pr_ioch, $filler) =

       ### Process usage values
       $New{$pid}{utime} = timestruct2int($pr_utime);
       $New{$pid}{stime} = timestruct2int($pr_stime);
       $New{$pid}{ttime} = timestruct2int($pr_ttime);
       $New{$pid}{ltime} = timestruct2int($pr_ltime);
       $New{$pid}{wtime} = timestruct2int($pr_wtime);
       $New{$pid}{slptime} = timestruct2int($pr_slptime);
       $New{$pid}{minf}  = $pr_minf;
       $New{$pid}{majf}  = $pr_majf;
       $New{$pid}{nswap} = $pr_nswap;
       $New{$pid}{inblk} = $pr_inblk;
       $New{$pid}{oublk} = $pr_oublk;
       $New{$pid}{vctx}  = $pr_vctx;
       $New{$pid}{ictx}  = $pr_ictx;
       $New{$pid}{sysc}  = $pr_sysc;
       $New{$pid}{ioch}  = $pr_ioch;
       # and a couple of my own,
       $New{$pid}{blks}  = $pr_inblk + $pr_oublk;
       $New{$pid}{ctxs}  = $pr_vctx + $pr_ictx;
       $New{$pid}{cpu}  = $New{$pid}{utime} + $New{$pid}{stime};

       #  struct psinfo

       ### Read psinfo stats
       open(PSINFO,"/proc/$pid/psinfo") || next;
       close PSINFO;

       ### Unpack psinfo values
       ($pr_flag, $pr_nlwp, $pr_pid, $pr_ppid, $pr_pgid, $pr_sid,
        $pr_uid, $pr_euid, $pr_gid, $pr_egid, $pr_addr, $pr_size,
        $pr_rssize, $pr_pad1, $pr_ttydev, $pr_pctcpu, $pr_pctmem,
        $pr_start, $pr_time, $pr_ctime, $pr_fname, $pr_psargs,
        $pr_wstat, $pr_argc, $pr_argv, $pr_envp, $pr_dmodel,
        $pr_taskid, $pr_projid, $pr_nzomb, $filler) =

        ### Save command name
        $Comm{$pid} = $pr_fname;

       next unless $STYLE_CHILD;       # only child needs the following,

       #  struct pstatus

       ### Read pstatus stats
       open(PSTATUS,"/proc/$pid/status") || next;
       close PSTATUS;

       ### Unpack pstatus values
       ($pr_flags, $pr_nlwp, $pr_pid, $pr_ppid, $pr_pgid, $pr_sid,
        $pr_aslwpid, $pr_agentid, $pr_sigpend, $pr_brkbase, $pr_brksize,
        $pr_stkbase, $pr_stksize, $pr_utime, $pr_stime, $pr_cutime,
        $pr_cstime, $filler) =

       ### Process pstatus values
       $New{$pid}{cutime} = timestruct2int($pr_cutime);
       $New{$pid}{cstime} = timestruct2int($pr_cstime);
       $New{$pid}{ccpu}  = $New{$pid}{cutime} + $New{$pid}{cstime};

   ### Cleanup memory
   foreach $pid (keys %New) {
       # save PID values,
       foreach $key (keys %{$New{$pid}}) {
               $PID{$pid}{$key} = $New{$pid}{$key} - $Old{$pid}{$key};
   undef %Old;
   foreach $pid (keys %New) {
       # save old values,
       foreach $key (keys %{$New{$pid}}) {
               $Old{$pid}{$key} = $New{$pid}{$key};

# PrintIO - print a report on I/O statistics: minf, majf, inblk, oublk, ioch.
sub PrintIO {
       my $sort = shift || "blks";
       my $top = $TOP;
       my $pid;

       ### Print header
       printf("%6s %5s %5s %8s %8s %9s %s\n","PID",

       ### Print report
       foreach $pid (&SortPID("$sort")) {
               printf("%6s %5s %5s %8s %8s %9.0f %s\n",$pid,
               last if --$top == 0;
       print "\n" if $MULTI;

# PrintTime - print a report on Times: utime, stime, wtime, ltime, ttime.
sub PrintTime {
       my $sort = shift || "cpu";
       my $top = $TOP;
       my $pid;

       ### Print header
       printf("%6s %8s %8s %8s %6s %6s %s\n","PID",

       ### Print report
       foreach $pid (&SortPID("$sort")) {
               printf("%6s %8.2f %8.2f %8.2f %6.2f %6.2f %s\n",$pid,
               last if --$top == 0;
       print "\n" if $MULTI;

# PrintCtx - print a report on Context Swithes: utime, stime, vctx, ictx, sysc.
sub PrintCtx {
       my $sort = shift || "ctxs";
       my $top = $TOP;
       my $pid;

       ### Print header
       printf("%6s %7s %7s %9s %8s %10s %s\n","PID",

       ### Print report
       foreach $pid (&SortPID("$sort")) {
               printf("%6s %7.2f %7.2f %9s %8s %10s %s\n",$pid,
               last if --$top == 0;
       print "\n" if $MULTI;

# PrintChild - print a report on Times: utime, stime, wtime, ltime, ttime.
sub PrintChild {
       my $sort = shift || "ccpu";
       my $top = $TOP;
       my $pid;

       ### Print header
       printf("%6s %8s %8s %8s %8s %s\n","PID",

       ### Print report
       foreach $pid (&SortPID("$sort")) {
               printf("%6s %8.2f %8.2f %8.2f %8.2f %s\n",$pid,
               last if --$top == 0;
       print "\n" if $MULTI;

# SortPID - sorts the PID hash by the key given as arg1, returning a sorted
#      array of PIDs.
sub SortPID {
       my $sort = shift;

       ### Sort numerically
       if ($sort eq "pid") {
               return sort {$a <=> $b} (keys %PID);
       } else {
               return sort {$PID{$b}{$sort} <=> $PID{$a}{$sort}} (keys %PID);

# getwinsz - gets the terminal window size and returns it as x, y.
#      The default size returned is 24x80 if an error is encountered.
sub getwinsz {
       my $row = 24;
       my $col = 80;
       my ($xpix,$ypix,$winsize);
       my $TIOCGWINSZ = 21608;         # check /usr/include/sys/termios.h

       open(TTY, "+</dev/tty") || return($row,$col);
       ioctl(TTY, $TIOCGWINSZ, $winsize=) || return($row,$col);
       ($row, $col, $xpix, $ypix) = unpack('S4', $winsize);

# timestruct2int - Convert a timestruct value (64 bits) into an integer
#      of seconds.
sub timestruct2int {
       my $timestruct = shift;
       my ($secs,$nsecs,$time);

       $secs = $nsecs = $time = 0;
       ($secs,$nsecs) = unpack("LL",$timestruct);
       $time = $secs + $nsecs * 10**-9;
       return $time;

# trunc - Returns a truncated string if required.
sub trunc {
       my $string = shift;
       my $length = shift;

       if ($WIDE) {
               return $string;
       } else {
               return substr($string,0,$length);

# Usage - print usage message and exit.
sub Usage {
       print STDERR <<END;
prusage ver 0.97
USAGE: prusage [-chinuwx] [-p PID] [-s sort] [-t top] [interval] [count]

      prusage               # Default. (-ic 1), fit to screen, 1 secs.
      prusage -b            # Child times report (must be root or owner)
      prusage -i            # I/O stats (default)
      prusage -u            # USR/SYS times
      prusage -x            # Context Switchs
      prusage -w            # Wide output
      prusage -c            # Clear the screen (default)
      prusage -C            # Don't clear the screen
      prusage -T            # Don't fit to screen (print all lines)
      prusage -p pid        # Print this PID only
      prusage -s sort       # Sort on pid,blks,cpu,utime,inblk,vctx,...
      prusage -t lines      # Print top lines only
      prusage 2             # 2 second samples (first is historical)
      prusage 2 5           # 5 x 2 second samples
      prusage -xi 2         # I/O and Context switch reports, 2 secs
      prusage -biux 10      # multi output, all reports every 10 secs
      prusage -C 10         # 10 second samples, no clear screen
      prusage -CT 10        # 10 second samples, all lines
      prusage -Ct8 10 5     # 5 x 10 second samples, top 8 lines only
      prusage -p 11321      # PID 11321 only
      prusage -s pid        # sort on PID