awk Text Analysis Tool#
Description:
awk
is a programming language used for text and data processing in Linux/Unix. The data can come from standard input (stdin), one or more files, or the output of other commands. It supports advanced features such as user-defined functions and dynamic regular expressions, making it a powerful programming tool in Linux/Unix. It is used in the command line, but is more commonly used as a script.awk
has many built-in features, such as arrays and functions, which are similar to C language. Flexibility is the greatest advantage ofawk
.
Syntax:
awk [option] 'PATTERN{ACTION STATEMENTS}' FILE
awk Execution Process#
awk Program Structure#
Structure | Explanation |
---|---|
BEGIN{ awk-commands } | Optional, the content inside is executed before awk reads the file |
/pattern/{ awk-commands } | Optional, this part of the code is executed once for each input line and can filter input lines with /pattern/ |
END{ awk-commands } | Optional, the content inside is executed after awk finishes reading the file |
Example#
# Prepare the following text content
[root@localhost ~]# cat emp.txt
001 Zhang San 1000 10
002 Li Si 2000 10
003 Wang Wu 3000 10
004 Zhao Liu 2000 20
005 Xiao Hong 1800 30
006 Xiao Li 800 20
# Use awk command to output the text content
[root@localhost ~]# awk '{print}' emp.txt
# Output the text content with a header and the end of the table (-------------------)
[root@localhost ~]# awk 'BEGIN{print "ID Name Salary Department"}{print}END{print "-------------------"}' emp.txt
ID Name Salary Department
001 Zhang San 1000 10
002 Li Si 2000 10
003 Wang Wu 3000 10
004 Zhao Liu 2000 20
005 Xiao Hong 1800 30
006 Xiao Li 800 20
-------------------
awk Command Options#
Option | Meaning |
---|---|
-F fs | Specifies the input field separator (default is whitespace), fs can be a string or a regular expression |
-v var=value | Assigns a user-defined variable |
-f scriptfile | Reads awk commands from a script file |
Usage of awk:
- Command line usage
Execute the
awk
command directly on the command line
awk '{print}' emp.txt
- Usage of awk script
Write the awk code in a file (usually with the extension .awk) and execute it using the -f option
awk -f file.awk emp.txt
- Usage in shell script
Write the awk command in a shell script and execute the shell script
awk Patterns and Actions#
Pattern | Meaning |
---|---|
/pattern/ | Regular expression |
Relational expression | Operate with operators |
Pattern matching expression | ~ (matches) !~ (does not match) |
Actions can consist of one or more commands, functions, or expressions, separated by newlines or semicolons
awk regular ranges:
/start/,/end/
NR==1,NR==5
from line 1 to line 5
Example#
# Output lines in emp.txt that contain the character 'Xiao'
[root@localhost ~]# awk '/Xiao/' emp.txt
# Output from the line containing 'Zhang' to the line containing 'Zhao'
[root@localhost ~]# awk '/Zhang/,/Zhao/' emp.txt
# Output the first three lines
[root@localhost ~]# awk 'NR<=3' emp.txt
[root@localhost ~]# awk 'NR==1,NR==3' emp.txt
[root@localhost ~]# awk 'NR>=1 && NR<=3' emp.txt
# Output if the pattern matches
[root@localhost ~]# awk '666 ~ /[0-9]+/' emp.txt
When using patterns, if you only want to output lines, the
awk
statement part can be omitted
awk Variables#
Built-in Variables (Predefined Variables)#
Variable | Meaning |
---|---|
$0 | The content of the current line during execution |
$n | The nth field of the current record (nth column after splitting) |
NF | The number of fields in the current line, $NF represents the last field in a line |
NR | The line number of the current line |
FS | The input field separator (default is whitespace, can be specified using -v) |
OFS | The output field separator (default is a space, can be specified using -v) |
RS | The input record separator (default is a newline, can be specified using -v) |
ORS | The output record separator (default is a newline, can be specified using -v) |
FILENAME | The file name of the current file |
User-defined Variables#
- Defined using the -v option
-v var=value
- Defined in the
awk
program
awk 'BEGIN{var=value}'
Passing External Variables#
-
The
-v
option can be used to pass an external variable to theawk
command -
The variable can also be passed by using
var=value
at the end of the command
Example#
# Print the content, line number, and number of fields of each line in emp.txt
[root@localhost ~]# awk '{print $0,NR,NF}' emp.txt
# Print the content of the first column, the last column, and the second-to-last column
[root@localhost ~]# awk '{print $1,$NF,$(NF-1)}' emp.txt
# Output the content in the order of "Department ID Name Salary"
[root@localhost ~]# awk '{print $4,$1,$2,$3}' emp.txt
# Output the second field of the third line
[root@localhost ~]# awk 'NR==3{print $2}' emp.txt
# Count the number of people in each department
[root@localhost ~]# awk '{print $NF}' emp.txt | uniq -c
# Change the output field separator to ':' and write the output to emp.txt.bak file
[root@localhost ~]# awk -v OFS=":" '{print $1,$2,$3,$4}' emp.txt > emp.bak.txt
[root@localhost ~]# awk -v OFS=":" 'NF=NF' emp.txt > emp.bak.txt
# For each line of emp.txt.bak file, output: "ID: xxx, Name: xxx, Salary: xxx"
[root@localhost ~]# awk -F: '{print "ID:"$1", Name:"$2", Salary:"$3}' emp.bak.txt
# User-defined variables
# Method 1
[root@localhost ~]# echo | awk -v a=100 '{print a}'
# Method 2
[root@localhost ~]# echo | awk 'BEGIN{b=100;print b}'
# Passing external variables
[root@localhost ~]# a='aaa'
[root@localhost ~]# b='bbb'
# Method 1
[root@localhost ~]# echo | awk -v v1=$a -v v2=$b '{print v1,v2}'
# Method 2
[root@localhost ~]# echo | awk '{print v1,v2}' v1=$a v2=$b
- When the parameters of
-printf
can control the output format of a specific field, and a newline character needs to be added manually
Common format specifiers:
Symbol | Explanation |
---|---|
%s | String |
%f | Floating point |
%d | Decimal integer |
%c | ASCII character |
%% | % itself |
Common escape characters:
Symbol | Explanation |
---|---|
\n | Newline |
\t | Horizontal tab |
\v | Vertical tab |
\r | Carriage return |
\b | Backspace |
\f | Form feed |
\a | Alert character |
\\ | \ itself |