tiny.cc/Awk awk.ref a few things to remind myself various format/syntax of awk. --- sed -i 's/ftp/FTP-disabled/' /etc/passwd # inline edit of file, no need to create tmp file and copy back. eg uss sed substitude command. # field delimiter, awk use -F, cut use -d # column, both awk and cut start with 1, awk $1, cut -f1 awk -F: '{print $1 ":" $3}' < /etc/passwd cut -d: -f1,3 < /etc/passwd # cut -f1-3 is 1 to 3. 3- is 3 onward. sed doing several sequential edits, older version need -e "s/.../.../" -e "..." eg from cat somefile.csv | sed -e "s/Total Hospital Beds/TotalHospitalBeds/" -e "s/Hospital Beds per 1,000 Population/HospitalBedsPer1kPop/" -e "s/Total CHCs/TotalCHCs/" -e "s/CHC Service Delivery Sites/CHCServiceDeliverySites/" --- awk 'NF > 1 {print $0}' < list awk -F: 'NF != 7 {print $0 }' < /etc/passwd # entry that does not have 7 columns will be printed for review -F: use colon as field/column delimiter NF = number of fields (ie, number of columns) $1 = first column $2 = second column $0 = whole line # perl column numbering starts with 0 # python column numbering starts with 0 # awk column numbering starts with 1 # cut column numbering starts with 1 --- Example with BEGIN and END blocks: ls -l /var/log | awk ' BEGIN {sum=0} {sum+=$5} END {print sum}' this will find out how much space are used by the files on the specified dir (but not sub dir) Note that there is no need to initiate/zero variable. so the above can be abreviated to (and display in GB): ls -l /var/log | awk '{sum+=$5} END {print sum/1024^2}' Another example on summing all quotas on a netapp vfiler: ssh ntap2 vfiler run nas5 quota report | awk '/vf5_vol1/ {sum+=$6} END {print sum/1024/1024}' define in .bashrc as shell function Size() { ls -l $* | awk '{sum+=$5} END {print sum}' ; } # Size *.txt # not /usr/bin/size! qhostTot() { qhost | awk '{h+=1; c+=$3} END {print "host="h " core="c}' ; } --- awk ' /pattern1/ { print "got pattern 1" } /pattern2/ { print "got pattern 2" } ' --- qstat -f | awk '$6 ~ /[a-zA-Z]/ {print $0}' # some host have error state in column 6, display only such host qstat -f | awk '$6 ~ /[a-zA-Z]/ && $1 ~ /default.q@compute/ {print $0}' # add an additional test for nodes in a specific queue cat passwd | awk -F: '$4 != $3 {print $0}' # print user whose GID (col 4) is not the same as the UID (col 3) cat passwd | awk -F: '$7 !~ /false/ {print $0}' # print user who does not have (/bin/)false as shell (column 7) cat passwd | awk -F: '$7 !~ /false|nologin/ {print $0}' # print user who does not have false or nologin as shell cat passwd | awk -F: '$7 !~ /false|nologin/ && $2 ~ /x/ {print $0}' # add test cond that col 2 has x as password --- man page info: relational operators: < > <= >- != == not equal to, equal to ~ reg exp match !~ negated match AWK patterns may be one of the following: BEGIN END /regular expression/ relational expression pattern && pattern pattern || pattern pattern ? pattern : pattern (pattern) ! pattern pattern1, pattern2 Control Statements The control statements are as follows: if (condition) statement [ else statement ] while (condition) statement do statement while (condition) for (expr1; expr2; expr3) statement for (var in array) statement break continue delete array[index] delete array exit [ expression ] { statements } eg: awk /pattern/ {statement} awk '{ if ( str1 ~ str2 ) print $0 }' awk '{ if ( str1 ~ str2 ) {print $0} }' awk '{ if ( str1 ~ str2 ) {print $0} else {print $2} }' what I haven't been able to find out is how str1 can be variable. arg... use perl! awk '/pattern1/ {print $0} ' # = "grep pattern1" awk '/pattern1/ {print "got pattern 1"} ' **************************************** it is pretty hard to get awk to see exactly one tab as delimiter. So, may want to use tr to convert tab to, say, the hat (^) symbol first. tr '\t' '^' < infile > outfile should do the trick. TAB is ASCII 9, \011 for really old tr not parsing \t. **************************************** example awk program using various construct, counting number of users using various shells: $7~/tcsh/{TSHELL++} $7~/bin.csh/{CSHELL++} $7~/a/{BSHELL++; print $7} END {print TSHELL " " CSHELL " " BSHELL} eg from slurm-allIdle-brc.sh : NODELIST=$( sinfo --Node --long --format '%N %20P %.10t' | awk "\$2 ~ /^$PARTITION$/ && \$3 ~ /idle/ {print \$1}" ) $2 ~ /^exactWord$/ # use this for exact matching. == is for number, can't compare string. use ^ and $ to indicate limit of word. column value would have space stripped already.