awk Programming

awk Programming#

awk Operators#

Arithmetic Operators#

Operator	Description
+ - * / %	Addition Subtraction Multiplication Division Modulus
^ **	Exponentiation
++ --	Increment Decrement (can be used as prefix or suffix)

Assignment Operators#

Operator	Description
= += -= = /= %= ^= *=	Assignment statement (a+=b is equivalent to a=a+b, others similar)

Relational Operators#

Operator	Description
> < >= <= != ==	Comparison statement (returns true if valid, false if not)

Logical Operators#

Operator	Description
\|\|	Logical OR (true if any is true)
&&	Logical AND (false if any is false)
!	Logical NOT (true becomes false, false becomes true)

Regular Expression Operators#

Operator	Description
~	Matches regular expression
!~	Does not match regular expression

Other Operators#

Operator	Description
$	Field reference
Space	String concatenation operator
in	Array member iteration operator (generally used with for loop to traverse arrays)
? :	Ternary operator (like in C language: `expression ? statement1 : statement2`, executes statement1 if expression is true, otherwise executes statement2)

awk Control Flow Statements#

Conditional Statements#

# if
if(expression)
	statement
	
# if-else
if(expression)
	statement1
else
	statement2 
	
# if-else-if
if(expression)
	statement1 
else if (expression2)
	statement2 
else
	statement3

The awk branching structure allows nesting; for convenience in judgment and readability, multiple statements can be enclosed in {}.

Loop Statements#

Three main loop statements

# while loop
while(expression){
	statement
}

# for loop
# Format 1
for(initial variable; loop condition; loop increment/decrement statement){
	statement
}

# Format 2 
for(variable in array){
	statement
}

# do-while loop
do{
	statement
} while(condition)

Loop control statements:

break; Exit the loop

continue; Skip the current loop

exit status_code; The exit statement is used to stop the execution of the script (if there is END, it transfers to END), accepts an integer parameter as the exit status code of the awk process; if no parameter is provided, it defaults to 0 ($? can be checked).

awk Arrays#

Arrays are the soul of awk, and array processing is often used in text processing.

awk array characteristics:

The index of awk arrays can be numbers or strings, thus awk arrays are associative arrays.
Internally, the indexes of awk arrays are all strings; even numeric indexes are converted to strings when used.
The order of elements in awk arrays may not be the same as the order in which elements are inserted.
Arrays in awk do not need to be declared in advance, nor do they need to declare size.
Array elements are initialized to 0 or an empty string based on context.

Creating (Adding, Modifying) Arrays#

Syntax: array_name[index] = value

The syntax for adding/modifying elements in an array is the same as creating an array.

Accessing Array Elements#

Syntax: array_name[index]

Deleting Array Elements#

Syntax: delete array_name[index]

Deleting a non-existent element will not cause an error.
delete array_name can directly delete all elements of the array.

length(arr) Get the length of the array.
asort(arr) Sort the array and return the length of the array.
split(str,arr,sep) Split a string into an array and return the length of the array.

The generated awk array index starts from 1, which is different from C language arrays.

[root@localhost ~]# awk 'BEGIN{str="a,b,c,d";len=split(str,arr,",");print len,length(arr),asort(arr),arr[1]}'
4 4 4 a

Traversing Arrays#

# Method 1
[root@localhost ~]# awk 'BEGIN{
str="a,b,c,d";
len=split(str,arr,",");
for(i in arr){
print i,arr[i];
}
}'
1 a
2 b
3 c
4 d

# Method 2 (awk arrays are associative arrays, this method ensures ordered traversal)
[root@localhost ~]# awk 'BEGIN{
str="a,b,c,d";
len=split(str,arr,",");
for(i=1;i<=len;i++){
print i,arr[i];
}
}'
1 a
2 b
3 c
4 d

# Check if the array contains a certain key
if(key in arr)

# Check if arr is an array
isarray(arr)  If arr is an array, returns 1, otherwise returns 0.
typeof(arr)   Returns the data type; if arr is an array, returns 'array'.

Multi-dimensional Arrays#

awk only supports one-dimensional arrays; we can use one-dimensional arrays to simulate multi-dimensional arrays.

# For a 3*3 two-dimensional array arr:
1 2 3 
4 5 6
7 8 9
In C language, arr[0][0] = 100; in awk, we can set arr[0,0] = 100, and so on: arr[0,1], arr[0,2]...arr[3,3];
In fact, 0,1   0,2   3,3   are just string indexes.

awk Built-in Functions#

Mathematical Functions#

Function	Description
sin(expr)	Returns the sine of expr
cos(expr)	Returns the cosine of expr
atan2(y,x)	Returns the arctangent of y/x
log(expr)	Returns the natural logarithm of expr
exp(expr)	Returns the exponential value of expr with base e
sqrt(expr)	Returns the square root of expr
int(expr)	Returns the integer value of expr truncated
rand()	Returns a random number n, where 0<=n<1
srand([expr])	Sets the seed value of the rand function to the value of expr parameter; if omitted, uses the current time

Example#

# Get a random integer between 0-99
[root@localhost ~]# awk 'BEGIN{srand();randint=int(100*rand());print randint}'

String Functions#

Function	Description
asort(arr [, d ])	Sorts the values of array `arr` in ASCII order
asorti(arr [, d])	Sorts the keys of array `arr` in ASCII order
gsub(regexp, sub, str)	Finds all occurrences of a specified pattern in a string and replaces them with another string
sub(search, sub, str)	Finds a specified string in a string and replaces it with another string. Only replaces once
index(str, sub)	Finds the position of a string within another string. Returns the position if found, otherwise returns 0
length(str)	Returns the length of a string
match(str, regexp)	Finds the position of the first longest substring that matches the pattern. Returns 0 if not found, otherwise returns the starting position of the longest substring
split(str, arr, regexp)	Splits a string into multiple substrings based on a given pattern. If no pattern is provided, uses the value of variable `FS`
printf(format, expr-list)	Constructs a string based on the given format and passed variables and outputs it to standard output
strtonum(str)	Checks if a string is a number and converts it to a decimal number
substr(str, start, len)	Returns a substring of `str` starting from `start` with length `len`
tolower(str)	Converts uppercase letters in the specified string to lowercase
toupper(str)	Converts lowercase letters in the specified string to uppercase

Example#

# asort(arr[,d])  arr-->array d-->array; if this parameter is passed, it will not modify arr, but will copy all elements from arr to d, and then sort d
[root@localhost ~]# awk 'BEGIN{
    arr[11]=800;
    arr[22]=200;
    arr[33]=300;
    arr[44]=100;
	for(i in arr){
        print i,arr[i];
    }
    asort(arr);
    print;
    for(j in arr){
        print j,arr[j];
    }
}'
11 800
22 200
33 300
44 100

1 100
2 200
3 300
4 800

# asorti
[root@localhost ~]# awk 'BEGIN{
	arr[11]=800;
	arr[22]=200;
	arr[33]=300;
	arr[44]=100;
    for(i in arr){
        print i,arr[i];
    }
    asorti(arr);
    print;
    for(j in arr){
        print j,arr[j];
    }
}'
11 800
22 200
33 300
44 100

1 11
2 22
3 33
4 44

# gsub(regexp, sub, str)
[root@localhost ~]# awk 'BEGIN{
    str="hello world";
    gsub("[o|l]","*",str);
    print str;
}'
he*** w*r*d

# sub(search, sub, str)
[root@localhost ~]# awk 'BEGIN{
    str="hello world";
    sub("[o|l]","*",str);
    print str;
}'
he*lo world

# index(str, sub)
[root@localhost ~]# awk 'BEGIN{
    str="hello world";
    idx=index(str,"l");
    print idx;
}'
3

# match(str, regexp)
[root@localhost ~]# awk 'BEGIN{
    str="hello world hi haaaaaa";
    idx=match(str,"h*");
    print idx;
}'
1

# strtonum(str)
[root@localhost ~]# awk 'BEGIN{
    str="01010";
    res=strtonum(str);
    print res;
}'
520

# substr(str, start, len)
[root@localhost ~]# awk 'BEGIN{
    str="hello world";
    res=substr(str,7,5);
    print res;
}'
world

# tolower(str)
[root@localhost ~]#  awk 'BEGIN{
    str="HAHAHA";
    res=tolower(str);
    print res;
}'
hahaha

# toupper(str)
[root@localhost ~]# awk 'BEGIN{
    str="hello world";
    res=toupper(str);
    print res;
}'
HELLO WORLD

Date and Time Functions#

Function	Description
systime()	Returns the current timestamp
mktime(datespec)	Converts a specified formatted time string (`YYYY MM DD HH MM SS`) to a timestamp
strftime([format [, timestamp]])	Formats a timestamp according to the specified format string and returns it as a string representation

Example#

# systime()
[root@localhost ~]# awk 'BEGIN{print systime()}'
1672448348

# mktime(datespec)
[root@localhost ~]# awk 'BEGIN{print mktime("2022 12 31 09 00 00")}'
1672448400

# strftime([format [, timestamp]])
[root@localhost ~]# awk 'BEGIN{print strftime("%c",systime())}'
Sat Dec 31 09:03:53 2022

Formatting#

Format Specifier	Description
%a	Localized weekday name, e.g., `Thursday`
%A	Abbreviated localized weekday name, e.g., `Thu`
%b	Localized month name, e.g., `May`
%B	Localized full month name, e.g., `May`
%c	Format `%A %B %d %T %Y` in C language, e.g., `Thursday May 30 2019 21:08:37`
%C	The century part of the current year, i.e., the first two digits of the four-digit year, e.g., `20` in 2019
%d	Day of the month, range `01-31`, e.g., `30`
%D	Abbreviated format `%m/%d/%y`, e.g., `05/30/19`
%e	Day of the month, range `1-31`, padded with space if less than `10`, e.g., `1` becomes `1`
%F	Alias for the `ISO 8601` date format `%Y-%m-%d`
%g	The week number of the year divided by 100, range `00-99`, e.g., January 1, 1993 is the 53rd week of 1992
%G	The full year number in the ISO week date system, e.g., `2019`
%h	Alias for format `%b`
%H	Hour in 24-hour format, range `00–23`
%I	Hour in 12-hour format, range `01–12`
%j	Day of the year, range `001–366`
%m	Month of the current time, range `01–12`
%M	Minute of the current time, range `00–59`
%n	Newline character `\n`
%p	Localized `AM` or `PM` in 12-hour format, i.e., localized morning or afternoon representation
%r	Localized 12-hour format time, similar to C language `%I:%M:%S %p`
%R	Abbreviated format `%H:%M`
%S	Current second, range `00-60`. `60` mainly considers leap seconds
%t	Tab character `\t`
%T	Abbreviated format `%H:%M:%S`
%u	Day of the week, range `1–7`, starting from Monday
%U	Week number of the year, range `00-53`, starting from the first Sunday
%V	Week number of the year, range `01-53`, starting from the first Monday
%w	Day of the week, range `0–6`, starting from Sunday
%W	Week number of the year, range `00-53`, starting from the first Monday
%x	Localized full date representation, similar to `%A %B %d %Y`, e.g., `Thursday May 30 2019`
%X	Localized full time representation, similar to C language `%T`, e.g., `07:06:05`
%y	Two-digit year, i.e., the last two digits of the year, range `00-99`, e.g., `19` for `2019`
%Y	Full four-digit year, e.g., `2019`
%z	Time zone offset in `+HHMM` format. Part of `RFC 822` or `RFC 1036` date format.
%Z	Time zone name or abbreviation. Returns empty string `''` if no time zone.

Other Functions#

Function	Description
close(expr)	Closes an already opened file or pipe
system(command)	Executes a system script command and returns the exit status of the script
getline	Reads the next line
next	Processes the next line
nextfile	Processes the next file

Example#

# close(expr) 
[root@localhost ~]# awk 'BEGIN{while("cat emp.txt" | getline){print $0}close("emp.txt")}'

# system(command)
[root@localhost ~]# awk 'BEGIN{system("ls -l")}'

# getline
[root@localhost ~]# awk '{getline;print}' emp.txt
002 Li Si 2000 10
004 Zhao Liu 2000 20
006 Xiao Li 800 20

[root@localhost ~]# awk 'BEGIN{print "Input:";getline name;print name}'
Input:
123
123

# next
[root@localhost ~]# awk '{if($3<2000)next;print}' emp.txt
002 Li Si 2000 10
003 Wang Wu 3000 10
004 Zhao Liu 2000 20

# nextfile
[root@localhost ~]# awk '{if($3==2000) nextfile;print}' emp.txt file
001 Zhang San 1000 10
xxx is a hanhan. ^_^

are you kidding?
I think ...
My phone number is 1872272****.

Custom Functions#

Function Definition

function function_name(parameter1, parameter2, ...) { 
   function body
}

Function names must start with a letter and can consist of letters, numbers, and underscores; reserved words cannot be used.

Statements in the function body must be separated by semicolons.

Functions can have return values or not; if a return value is needed, the return keyword must be used within the braces.

Function Call

# Call a function without parameters
fun_name 
# Call a function with parameters
fun_name(arg1[,arg2...])
# Call a function with return value
var = fun_name([arg1...])

Functions can be called in BEGIN, main, and END sections.

Example#

# Function without parameters
[root@localhost ~]# awk 'BEGIN{
	fun1()
}
function fun1(){
	print "this is a function!"
}'
this is a function!

# Function with parameters
[root@localhost ~]# awk 'BEGIN{
	fun2(10,20)
}
function fun2(num1,num2){
	print "The result of the function is:"num1+num2
}'
The result of the function is:30

# Function with return value
[root@localhost ~]# awk 'BEGIN{
	res = fun3(10,20);
	print "The sum of num1 and num2 is:"res
}
function fun3(num1,num2){
	return num1+num2
}'
The sum of num1 and num2 is:30

awk Programming

awk Programming#

awk Operators#

Arithmetic Operators#

Assignment Operators#

Relational Operators#

Logical Operators#

Regular Expression Operators#

Other Operators#

awk Control Flow Statements#

Conditional Statements#

Loop Statements#

awk Arrays#

Creating (Adding, Modifying) Arrays#

Accessing Array Elements#

Deleting Array Elements#

Array-related Functions#

Traversing Arrays#

Multi-dimensional Arrays#

awk Built-in Functions#

Mathematical Functions#

Example#

String Functions#

Example#

Date and Time Functions#

Example#

Formatting#

Other Functions#

Example#

Custom Functions#

Example#