Shell scripting: awk tutorial


This is one of the best linux line processing utility I have come across. I can do all sort of stuff. Add, replace, find and index stuff. And basically much more. I’ll just get to the examples.

The Basics:

Suppose we have a file like:

echo $somefile
this_is_something_interesting

echo $somefile | awk -F '_' '{print toupper($3)}'
SOMETHING

Now, -F is the field delimiter. And we split the contents of the variable or a file based on the delimiter, which in our case is ‘_’. Now print is the standard function to print out stuff. Now $3 contains the third split value, which in this case is ‘something’. toupper() converts this to uppercase. Duh!.

This example shows how awk converts a string into an array.

echo $time
10:20:30

hms=`echo $time | awk '{split($0,a,":"); print a[1], a[2], a[3]}'`
echo $hms
10 20 30

How the delimiter is ‘:’. Store the splits into variable a which acts like an array. a[2] contains the second split 20 and so on.

Okay, so how do the get the second last or last split of a string?

c=`echo $i | awk 'BEGIN{FS="_"}{for (i=1; i<=NF; i++) if (i==NF-1) print $i}'` # NF contains total number of splits and variable c contains the second last word-split.

This is the basic syntax. You start with BEGIN, where you mention the field seperator FS. Now, NF is a special variable that contains the number of splits. So we loop through the condition, till we reach NF-1, the second last split, which we check using the ‘if’ condition. Then just print it out of course!

Substitution using awk:
This is done using ‘sub’. The first and only the FIRST occurrence of ‘shower’ is replaced by ‘steam’. ‘$0’ means the entire string.

text=`echo $text | awk '{sub("shower","stream"); print $0}'` # substitution only first

Global substitution using awk:
Using ‘gsub’, the entire string or file is replaced. Now, a[a-z] means, all words starting with a and followed by any alphabet, is to be replaced by x.

text1=`echo $text | awk '{gsub("a[a-z]","x"); print $0}'` # global, a followed by an alpha replaced by x

Similarly, the condition here is to replace anything between ‘a’ and ‘d’ with ‘tt’.

text2=`echo $text | awk '{sub("a*d","tt"); print $0}'` # a followed by anything till d, replace d with tt

Similarly for numbers:

cat=`echo $name | awk '{sub("[0-9]+",""); print $0}'` # first occurance removes numbers

Another example:

short=`echo $name | awk '{gsub("[b-z]",""); print $0}'` # global removes all from b to z and replace with ''

Substring using awk:
Suppose we want just the part of the string. (12,8) means go to the 12th character, and get me the next 8 characters.

echo $caption
thisislinuxjunkies

object=`echo $caption | awk '{print substr($0,12,8)}'` # substring
echo $object
inuxjunk

Reading a particular set of lines:
‘NR’ is a special variable with awk which tells us about the number of lines read.

echo $myfile
a
b
c
d
awk 'NR < 3' $myfile # number of lines read in a file
a
b

This is not a complete set of things you get to do with awk. I’ll update this post if I find more neat tricks! Questions appreciated! 🙂

Advertisements
Shell scripting: awk tutorial

3 thoughts on “Shell scripting: awk tutorial

  1. Hello webmaster do you need unlimited articles for your page ?
    What if you could copy post from other pages, make it unique and publish on your blog – i know the right tool for you, just search in google:

    Loimqua’s article tool

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s