Shell scripting | awk & sed
Index
- What are awk and sed
- Working with awk
- Making an AWK command file
- Working with sed
- Combining Awk and Sed
- Summing up
Shell scripting covers almost every essential need to create automated command-line programs. But what about going beyond the standards and extending our arsenal with some external tools? Let's dive a bit inside awk
and sed
.
Write programs to handle text streams, because that is a universal interface. — Ken Thompson.
What are awk and sed
Awk
is a programming language that let us manipulate structured data.Sed
is a stream editor to manipulate, filter, and transform text.
Both of them are stream-oriented; they read input from text files one line at a time and direct the result to the standard output, which means the input file itself is not changed if it's not specified to do so.
Although their syntax may look cryptic, awk(1)
and sed(1)
can solve a lot of complex tasks in a single line of code. Combining them with the use of regular expressions we have a Swiss army knife for anyone working with text files. Since we're working inside a *nix system this is perfect for us.
One of the most useful cases with awk(1)
and sed(1)
is parsing files and generating reports. It's a bit complicated to explain both tools without seeing them in action. To work through this post without searching too much for a file to parse, create a file named pieces-list and populate with some text inside:
Name= "Capacitor" ID= 3456 quant.= 204 Man.= "Bosch"
Name= "Battery" ID= 2760 quant.= 0 Man.= "Phillips"
Name= "Fan-Frame" ID= 7864 quant.= 131 Man.= "Mitsubishi"
Name= "Bluetooth-Emmiter" ID= 19085, quant.= 184 Man.= "Intel"
Name= "WiFi-Card", ID= 2941, quant.= 115, Man.= "Intel"
Name= "Fan" ID= 4512 quant.= 98 Man.= "OEM"
Working with awk
awk(1)
is a full fledged programming language and a powerful file parser. It offers a more general computational model for processing a file, allowing us to replace an entire shell script with an awk single liner.
awk(1)
programs consist of a series of rules. Rules generally consist of a pattern and a set of actions.
When a file is processed, awk(1)
reads the file line by line, then it checks to see if the line matches one or more of the patterns in the file and executes the actions associated to the matching pattern, taking the selected line as it's input.
If you've been reading the previous workshops, you'll notice that we've used
awk(1)
previously to configure our panel bar.
— The basic command-line syntax to invoke awk(1)
is:
$ awk [options] 'pattern {actions}' inputFile
We've seen how to get an output of a file before, using the cat command.
$ cat pieces-list
We've also seen how to split data to print only the parts we want using grep.
$ cat pieces-list | grep Intel
To start working with awk(1)
let's use it to print our pieces-list file:
$ awk '{ print }' pieces-list
We should have the same output result after running the program with both cat and the new awk method.
With awk(1)
we can use patterns too:
$ awk '/Intel/ { print }' pieces-list
patterns are declared between forward slashes.
This is useful but we still get a complete line containing the pattern we were looking for. One powerful feature of awk(1)
is that we can select pieces (named fields) of the line.
Named fields are represented with a dollar sign and the position number ($N)
.
$ awk '/Intel/ { print $2 }' pieces-list
Sometimes our pattern has to meet some conditions to be useful for us. We can use boolean statements to perform as patterns too:
In this example, the condition is that the sixth field has to be greater than one hundred:
$ awk '$6 > 1 { print $2 }' pieces-list
By default field separators are defined by spaces or tabs. If we want to use other pattern as a field separator we have to indicate so, changing the F
variable:
-F=,
awk(1)
allows us to use some internal functions to perform several actions.
length()
allows to get the number of characters for the specified named fields.
$ awk '{ print length($2) }' pieces-list
printf
formats the output of the specified named fields. We can align items both to the left and to the right using-%
and%
respectively.
$ awk '$6 > 1 { printf "%-19s", $2 }' pieces-list
Making an AWK command file
We can go further with awk(1)
and store all our commands inside a file so it's easier to apply the same line of commands for multiple files.
awk(1)
command files can contain two special patterns:
BEGIN{}
is a pattern that is executed only once before running the main commands.END{}
is a pattern that performs actions after all the instructions have been executed. It's only executed once.
Let's create a script to store our awk(1)
commands:
$ touch steps.awk
so now we can perform some awk(1)
examples into our pieces-list file.
— Format output
Our example pieces-list
text file is a bit messy. Wouldn't it be great to have each field ordered in nice columns?
First we need to define which character size our columns need. This value is given by our longest value in each field.
Using the builtin function length($N)
we can get those values.
Let's define our main columns with the given values in our BEGIN
pattern:
BEGIN{ printf "\n%-15s %-22s %-5s %9s\n", "MANUFACTURER ", "| PIECE NAME ","| ID ","|QUANTITY"}
In the main body we need a similar line for each one of the products in the list. This time we have to change our printed format in the fields that need to output a number:
{printf "%-16s %-22s %6d %9d\n", $8, $2, $4, $6}
In order to execute our stored awk(1)
commands, we simply need to indicate awk(1)
to read the file as follows:
$ awk -f steps.awk pieces-list
Our result should look similar to this:
MANUFACTURER | PIECE NAME | ID | QUANTITY
-------------------------------------------------------
"Bosch" "Capacitor" 3456 204
"Phillips" "Battery" 2760 0
"Mitsubishi" "Fan-Frame" 7864 131
"Intel" "Bluetooth-Emmiter" 19085 184
"Intel" "WiFi-Card" 2941 115
"OEM" "Fan" 4512 98
The same we used our messy example file we can use a web server ip traffic, username and password databases… endless possibilities to format.
— Process command-line arguments
We can take input from the user and pass it as a variable to perform actions with our data.
Let's say we want to ask the user for the product's ID and report them the manufacturer's name, the product's name and it available quantity.
First, we can create a search.awk
script where we can perform the following instructions:
BEGIN{ print "Search results:\n" }
{if ( id == $4 ) print "Item ID " $4 "\n\t— Manufacturer: " $8 "\n\t— Piece Name: " $2, "\n\t—Stock Amount: " $6}
END{ print "\n---------------------------------\n"}
In this case we have created a variable named id to compare against our ID field. To make it work we should run the script addressing a value for the variable:
$ awk -v id=3456 -f search.awk pieces-list
Search results:
Item ID 3456
— Manufacturer: "Bosch"
— Piece Name: "Capacitor"
— Stock Amount: 204
---------------------------------
— Arithmetic and string operators
As in almost every programming programming language we can perform arithmetic operations inside awk(1)
passing named fields as values to operate with:
$ awk '{result += $6} END{printf "total amount of items: %d\n", result}' pieces-list
total amount of items: 732
Working with sed
sed(1)
automates actions that seem a natural extension of interactive text editing. Most of these actions like replacing text, deleting lines, inserting new text, removing words… could be done manually from a text editor.
Automating all editing instructions inside one place and execute them in one pass can change hours of manual working in minutes of automated computing.
The command-line syntax to invoke sed(1)
is:
$ sed [options] instructions inputFile
If we run sed(1)
without any of these three parts we will have our file printed into our command line:
$ sed '' pieces-list
As you can see, the structure of calling sed is similar to calling awk.
Using sed instructions
some intro text…
This are a few instructions we can combine:
/
acts as a separator for numbers or patterns.
/patternA/patternB/
s
replace all the occurrences with a pattern.
s/orig/new/
We can indicate where to replace the matching pattern by adding the number of lines before the s character.
2s/orig/new/
This will replace orig with new in the second line of the file.
We can replace a pattern with an empty space too by leaving the new pattern blank:
$ sed 's/orig//g'
g
means everywhere.
s/orig/new/g
This will replace every occurrence of the orig
pattern found in the document with the new
pattern.
w
writes the contents of the pattern space into a file.
w /path/to/output_file
d
deletes a specified lineNd
whereN
is the line number. This can be act just the opposite, deleting all non matching input by adding an exclamation pointN!d
.
1d inputfile
1!d inputfile
Combining SED commands
Running multiple commands with sed(1)
can be achieved by separating them inside the single quotes with semi colons ;
in the command line, or by writing all the commands into a file with the extension .sed
.
Let's use some sed
power to work inside our pieces-list file:
— Find and replace
$ sed 's/quant./Quantity/' pieces-list
$ sed 's/Man./Manufacturer/' pieces-list
This method will change any original pattern (quant.
and Man.
) match in the worklist with a new specified pattern (Quantity
and Manufacturer
) that occurs the first time on a line.
We can replace the pattern "
with an empty space too by leaving the new pattern blank:
$ sed 's/"//g' pieces-list
Now let's combine both instructions at once:
$ sed 's/quant./Quantity/'; 's/Man./Manufacturer/'; 's/"//g' pieces-list
so our pieces-list content looks like this:
Name= Capacitor ID= 3456 Quantity= 204 Manufacturer= Bosch
Name= Battery ID= 2760 Quantity= 0 Manufacturer= Phillips
Name= Fan-Frame ID= 7864 Quantity= 131 Manufacturer= Mitsubishi
Name= Bluetooth-Emitter ID= 19085, Quantity= 184 Manufacturer= Intel
Name= WiFi-Card, ID= 2941, Quantity= 115, Manufacturer= Intel
Name= Fan ID= 4512 Quantity= 98 Manufacturer= OEM
— Extract and edit
Another powerful option that we have the ability to perform within sed(1)
is to extract information from a file, edit that information in memory and put the new edited data inside another file, without using pipelines.
Let's use a file for storing the instructions for sed(1)
.
$ vim extract.sed
We want to inspect a whole file, and we're not going to know the number of lines. We need to search from pattern one through to pattern two:
/Name=/,/Man.=/
so we work with the text contained between the start and the end pattern.
Working on that pattern space we can open a curly brackets section, just like a function so we can store the commands to execute in.
/Name=/,/Man.=/ {
s/"//g
s/.*Man.=//g
w manufacturer_list
}
Now we can run sed(1)
associating this script file to create our output file.
$ sed -f extract.sed pieces-list
Bosch
Phillips
Mitsubishi
Intel
Intel
OEM
Of course all of this can be scripted through pipelines but using just sed(1)
we've achieved the same in fewer lines and less time.
Combining Awk and Sed
We've seen that we can take the advantage of clean our text data with sed(1)
, and format it with awk(1)
. Let's go a step further and combine both powers to get a better report.
An example of this combination can be:
$ sed 's/"//g' pieces-list | awk -f steps.awk
This way we remove the double quotes from all names and get a clean result.
We can sort results taking any desired field as an index base. In this case we are going to use the Manufacturer's name to perform a sorted list at the items:
$ sed 's/"//g' pieces-list | awk '{ print $8 " " $0 }' | sort | awk -f steps.awk
We know what the sed(1)
line does. Let's analyze the awk(1)
one:
- After the first pipe, we call awk to print the eighth value of the list with print $8.
- Next, we add a blank space with
" "
. This acts as our separator. Since the file is using spaces, we keep the method. - Lastly we print the whole corresponding line so the next program in the pipe can read the information correctly.
Our result is going to be something weird. The formatted list maybe looks like this:
MANUFACTURER | PIECE NAME | ID | QUANTITY
-------------------------------------------------------
Man.= Name= 0 0
Man.= Name= 0 0
Man.= Name= 0 0
Man.= Name= 0 0
Man.= Name= 0 0
Man.= Name= 0 0
---- End of report. Time: 06:44 | Date: 2020-04-01 ----
Since we are adding the eighth field as an index, we have increased the length of the lines and we need to increase the field to print inside our steps.awk
file.
Having to track all this steps individually and in different files is not useful at all, that's why writing shell scripts for multiple tasks is so handy (yes, we can call sed(1)
and awk(1)
from within a shell script!).
— Create a script named format-report.sh
and open it.
- Remember this is a Shell script so indicate it at the beginning of the file.
#!/bin/sh
- First we need to order our list based on the manufacturer's name.
awk '{print $8" " $0 }' $* | sort |
- We have to add a header for our report using the
BEGIN
pattern fromawk(1)
.
awk 'BEGIN{ printf "\n%-15s %-22s %-5s %9s\n", "MANUFACTURER ", "| PIECE NAME ","| ID ","| QUANTITY"
print "-------------------------------------------------------\n"}
- Next we execute the main loop of
awk(1)
to print the formatted list.
{printf "%-18s %-23s %6d %9d\n", $9, $3, $5, $7}
- And we can add some condition to check if an item is out of stock.
{ if ($7 < 1) printf "\nWarning! Item %d is out of stock.(%s from %s)\n", $5, $3, $9}
- Once the main loop is done we can print a footer for our report using the
END
pattern, indicating time and date.
END {"date +'%Y-%m-%d'"|getline d; "date +'%H:%M'"|getline t; print "\n---- End of report. Time: " t " | Date: " d " ----"}' |
- Lastly we call
sed(1)
to get rid of the double quotes that the names inside the list have.
sed 's/"//g'
In order to run the script, save it, change its permissions to make it executable, and pass the pieces-list as the first argument:
$ ./format-report.sh pieces-list
We should see something similar to this:
MANUFACTURER | PIECE NAME | ID | QUANTITY
-------------------------------------------------------
Bosch Capacitor 3456 204
Intel Bluetooth-Emmiter 19085 184
Intel WiFi-Card 2941 115
Mitsubishi Fan-Frame 7864 131
OEM Fan 4512 98
Phillips Battery 2760 0
Warning! Item 2760 is out of stock.(Battery from Phillips)
---- End of report. Time: 07:40 | Date: 2020-04-01 ----
Summing up
A fundamental part of the power of *nix systems are pipes and the ability to use them to combine programs as building blocks in many ways to create automated workflows.
We've seen how to manage text data without touching a manual text editor in several ways, so now we can introduce these techniques using awk(1)
and sed(1)
to our pipe workflow with a new level of flexibility.