Group By in Shell

Storage manager dropped by my cube to say that redo log archives for one of our DBs grew from 40G to 200G in the last day. Running “du -Sh” few times showed that the files are no longer getting written as fast, so the issue is probably over. But what was it? And to start, when was it? Once I figure out when the issue occure, I may be able to find the offending DML in ASH.

So, I need to do something like:

select day,hour,sum(bytes)
from (ls -l)
group by day,hour
order by day,hour

I guess I could have used the output of “ls -l”  as an external table and actually run this query. But I’m actually trying to improve my shell scripting skills. There must be some way to do it!

I asked on Twitter, and Kris Rice (of SQLDeveloper fame) pointed me in the right direction – use AWK.

Here’s the result:

ls -l | sed ‘s/:/ /g’ | awk ‘{sizes[$7$8] += int($5)} END {for (hour in sizes)
{ printf(“At %s %d bytes were inserted\n”,hour,sizes[hour])}}’ | sort

I run ls. Use sed to seperate the time into fields (Laurent Schneider will kill me, because I’m mixing sed and awk, but I’m not it his level yet…). After the split, the fifth field contains the file size, the seventh contains the date and the eighth is the hour. So I  have an associative array called “sizes”, indexed on the date and hour, and each bucket contains the sum of the file sizes in this hour.

Then I loop on the array and print the results, and sorted them to make it more readable.

A long post for a short script, but I want to make sure I don’t forget it 😉

Anyway, the best part by far was everyone who tried to help my desperate call in Twitter/Facebook. It was unexpected and very very nice. About 5 people sent tips, ideas and even solutions! I only imagine that doing my work is more fun than doing their own work, but I appreciate the help a lot.

Advertisements

19 Comments on “Group By in Shell”

  1. Noons says:

    Possible alternative would be to look into v$archived_log. It has the size and time of each archived log for the last month or so.

  2. prodlife says:

    DAMN! I complete forgot about that!

    At least I got to learn some awk that way, but I feel really dumb now…

  3. John Piwowar says:

    In addition to new awk knowledge, now you have a general solution. Next time, maybe it’ll be trace files causing the problem. 🙂

    If you’d like to avoid sed, awk has split:
    ls -l | awk ‘{split($7,t,”:”);sizes[t[1]] += int($5)} END {for (hour in sizes) { printf(“At %s %d bytes were inserted\n”,hour,sizes[hour])}}’ | sort

    Regards,

    John P.

  4. John Piwowar says:

    Also, because I posted my earlier reply too quickly: Thanks for sharing this. I learned more about awk today from your post, and that’s never a bad thing. 🙂

    Regards,

    John P.

  5. Moshe Zadka says:

    Don’t use sed before awk, just use the -F: flag to awk 🙂

  6. karlarao says:

    Hope this helps 🙂

    SET LINESIZE 145
    SET PAGESIZE 9999
    SET VERIFY off
    COLUMN H00 FORMAT 999 HEADING ’00’
    COLUMN H01 FORMAT 999 HEADING ’01’
    COLUMN H02 FORMAT 999 HEADING ’02’
    COLUMN H03 FORMAT 999 HEADING ’03’
    COLUMN H04 FORMAT 999 HEADING ’04’
    COLUMN H05 FORMAT 999 HEADING ’05’
    COLUMN H06 FORMAT 999 HEADING ’06’
    COLUMN H07 FORMAT 999 HEADING ’07’
    COLUMN H08 FORMAT 999 HEADING ’08’
    COLUMN H09 FORMAT 999 HEADING ’09’
    COLUMN H10 FORMAT 999 HEADING ’10’
    COLUMN H11 FORMAT 999 HEADING ’11’
    COLUMN H12 FORMAT 999 HEADING ’12’
    COLUMN H13 FORMAT 999 HEADING ’13’
    COLUMN H14 FORMAT 999 HEADING ’14’
    COLUMN H15 FORMAT 999 HEADING ’15’
    COLUMN H16 FORMAT 999 HEADING ’16’
    COLUMN H17 FORMAT 999 HEADING ’17’
    COLUMN H18 FORMAT 999 HEADING ’18’
    COLUMN H19 FORMAT 999 HEADING ’19’
    COLUMN H20 FORMAT 999 HEADING ’20’
    COLUMN H21 FORMAT 999 HEADING ’21’
    COLUMN H22 FORMAT 999 HEADING ’22’
    COLUMN H23 FORMAT 999 HEADING ’23’
    COLUMN TOTAL FORMAT 999,999 HEADING ‘Total’
    SELECT
    SUBSTR(TO_CHAR(first_time, ‘MM/DD/RR HH:MI:SS’),1,5) DAY
    , SUM(DECODE(SUBSTR(TO_CHAR(first_time, ‘MM/DD/RR HH24:MI:SS’),10,2),’00’,1,0)) H00
    , SUM(DECODE(SUBSTR(TO_CHAR(first_time, ‘MM/DD/RR HH24:MI:SS’),10,2),’01’,1,0)) H01
    , SUM(DECODE(SUBSTR(TO_CHAR(first_time, ‘MM/DD/RR HH24:MI:SS’),10,2),’02’,1,0)) H02
    , SUM(DECODE(SUBSTR(TO_CHAR(first_time, ‘MM/DD/RR HH24:MI:SS’),10,2),’03’,1,0)) H03
    , SUM(DECODE(SUBSTR(TO_CHAR(first_time, ‘MM/DD/RR HH24:MI:SS’),10,2),’04’,1,0)) H04
    , SUM(DECODE(SUBSTR(TO_CHAR(first_time, ‘MM/DD/RR HH24:MI:SS’),10,2),’05’,1,0)) H05
    , SUM(DECODE(SUBSTR(TO_CHAR(first_time, ‘MM/DD/RR HH24:MI:SS’),10,2),’06’,1,0)) H06
    , SUM(DECODE(SUBSTR(TO_CHAR(first_time, ‘MM/DD/RR HH24:MI:SS’),10,2),’07’,1,0)) H07
    , SUM(DECODE(SUBSTR(TO_CHAR(first_time, ‘MM/DD/RR HH24:MI:SS’),10,2),’08’,1,0)) H08
    , SUM(DECODE(SUBSTR(TO_CHAR(first_time, ‘MM/DD/RR HH24:MI:SS’),10,2),’09’,1,0)) H09
    , SUM(DECODE(SUBSTR(TO_CHAR(first_time, ‘MM/DD/RR HH24:MI:SS’),10,2),’10’,1,0)) H10
    , SUM(DECODE(SUBSTR(TO_CHAR(first_time, ‘MM/DD/RR HH24:MI:SS’),10,2),’11’,1,0)) H11
    , SUM(DECODE(SUBSTR(TO_CHAR(first_time, ‘MM/DD/RR HH24:MI:SS’),10,2),’12’,1,0)) H12
    , SUM(DECODE(SUBSTR(TO_CHAR(first_time, ‘MM/DD/RR HH24:MI:SS’),10,2),’13’,1,0)) H13
    , SUM(DECODE(SUBSTR(TO_CHAR(first_time, ‘MM/DD/RR HH24:MI:SS’),10,2),’14’,1,0)) H14
    , SUM(DECODE(SUBSTR(TO_CHAR(first_time, ‘MM/DD/RR HH24:MI:SS’),10,2),’15’,1,0)) H15
    , SUM(DECODE(SUBSTR(TO_CHAR(first_time, ‘MM/DD/RR HH24:MI:SS’),10,2),’16’,1,0)) H16
    , SUM(DECODE(SUBSTR(TO_CHAR(first_time, ‘MM/DD/RR HH24:MI:SS’),10,2),’17’,1,0)) H17
    , SUM(DECODE(SUBSTR(TO_CHAR(first_time, ‘MM/DD/RR HH24:MI:SS’),10,2),’18’,1,0)) H18
    , SUM(DECODE(SUBSTR(TO_CHAR(first_time, ‘MM/DD/RR HH24:MI:SS’),10,2),’19’,1,0)) H19
    , SUM(DECODE(SUBSTR(TO_CHAR(first_time, ‘MM/DD/RR HH24:MI:SS’),10,2),’20’,1,0)) H20
    , SUM(DECODE(SUBSTR(TO_CHAR(first_time, ‘MM/DD/RR HH24:MI:SS’),10,2),’21’,1,0)) H21
    , SUM(DECODE(SUBSTR(TO_CHAR(first_time, ‘MM/DD/RR HH24:MI:SS’),10,2),’22’,1,0)) H22
    , SUM(DECODE(SUBSTR(TO_CHAR(first_time, ‘MM/DD/RR HH24:MI:SS’),10,2),’23’,1,0)) H23
    , COUNT(*) TOTAL
    FROM
    v$log_history a
    GROUP BY SUBSTR(TO_CHAR(first_time, ‘MM/DD/RR HH:MI:SS’),1,5)
    ORDER BY SUBSTR(TO_CHAR(first_time, ‘MM/DD/RR HH:MI:SS’),1,5)
    /

  7. -F: will not work because column is not the only separator. Perhaps split…

    ls -l|awk ‘{split($8,x,”:”);print “h=”x[1]” m=”x[2]”}’

    Anyway I dislike using ls as input, because sometimes the 8th field is a time, sometimes it is a year… and also you can group only by hour or minute…

    What about perl (come on, I know you love opensource, so get a perl book right now 🙂 )

    perl -e 'use POSIX;while(){$t{3600*int((stat($_))[9]/3600)}+=(stat($_))[7]}foreach $a(keys %t){print(ctime($a)." ".$t{$a}."\n")}#ddl'
    Wed Nov 19 16:00:00 2008
    1627
    Tue Nov 18 11:00:00 2008
    631
    Tue Nov 18 16:00:00 2008
    2164
    Wed Sep 3 15:00:00 2008
    9260
    Wed Sep 3 08:00:00 2008
    169
    Tue Oct 7 12:00:00 2008
    320
    Fri Nov 14 16:00:00 2008
    153

    PS: I added a comment in the source to make the code self-explanatory!

  8. also ls is very painful when you have file name with carriage returns :-))

  9. repost due to < >
    perl -e 'use POSIX;while(<*.*>){$t{3600*int((stat($_))[9]/3600)}+=(stat($_))[7]}foreach $a(keys %t){print(ctime($a)." ".$t{$a}."\n")}#ddl'

  10. also stat let you chose between creation time and modification time and access time man stat

  11. BTW, group by is not a shell command, but a sql command… so simply use SQLPLUS :mrgreen:


    find . -type f -ls|
    awk 'BEGIN {
    print "set feed off"
    print "create global temporary table lsc_tmp(d date,s number);"
    print "alter session set nls_date_format='\''MON DD HH24:MI YYYY'\'';"
    }
    {
    printf "insert into lsc_tmp values(to_date('\''"$8" "$9" "
    if ($10~/:/)printf $10" 2009"
    else printf "00:00 "$10
    print "'\''),"$7");"
    }
    END {print "SELECT TRUNC(D,'\'HH\''), sum(s) from lsc_tmp group by TRUNC(D,'\'HH\'') order by 1;"}' |
    sqlplus -s scott/tiger

  12. We are lacking a Ruby solution:

    14:19:32 ~$ ruby -e ‘c=Hash.new(0);Dir[“*”].each {|f|s=File.stat(f);c[s.mtime.strftime(“%Y-%m”)]+=s.size};puts c.sort’
    2007-02
    1739
    2007-06
    4597
    2008-03
    0
    2008-06
    173
    2009-01
    0
    2009-03
    813
    2009-04
    12765
    14:19:58 ~$

    In a more readable version

    c = Hash.new(0)

    Dir[“*”].each do |f|
    s = File.stat(f)
    c[s.mtime.strftime(“%Y-%m”)] += s.size
    end

    puts c.sort

  13. Chen Shapira says:

    @Robert
    I wish I had Ruby on my servers to check this!

  14. Chen, that’s too bad. If you have a C compiler on the box you can grab the newest version and compile it yourself. 🙂

  15. hey chen, I thought you were a cygwin fanatic… it is a very nice syntax indeed!

    • prodlife says:

      cygwin fanatic? But I don’t use cygwin at all… Maybe you confuse it with being a vmware fanatic?

  16. well, ok, I do not remember what kind of pseudo shell you use instead of DOS on your PC, apologize for confusion

  17. Right, MKS32… I should have remember your fork tentative 😉


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s