awk Programming#
awk Operators#
Arithmetic Operators#
Operator | Description |
---|---|
+ - * / % | Addition Subtraction Multiplication Division Modulus |
^ ** | Exponentiation |
++ -- | Increment Decrement (can be used as prefix or suffix) |
Assignment Operators#
Operator | Description |
---|---|
= += -= *= /= %= ^= **= | Assignment statement (a+=b is equivalent to a=a+b, others similar) |
Relational Operators#
Operator | Description |
---|---|
> < >= <= != == | Comparison statement (returns true if valid, false if not) |
Logical Operators#
Operator | Description |
---|---|
|| | Logical OR (true if any is true) |
&& | Logical AND (false if any is false) |
! | Logical NOT (true becomes false, false becomes true) |
Regular Expression Operators#
Operator | Description |
---|---|
~ | Matches regular expression |
!~ | Does not match regular expression |
Other Operators#
Operator | Description |
---|---|
$ | Field reference |
Space | String concatenation operator |
in | Array member iteration operator (generally used with for loop to traverse arrays) |
? : | Ternary operator (like in C language: expression ? statement1 : statement2 , executes statement1 if expression is true, otherwise executes statement2) |
awk Control Flow Statements#
Conditional Statements#
# if
if(expression)
statement
# if-else
if(expression)
statement1
else
statement2
# if-else-if
if(expression)
statement1
else if (expression2)
statement2
else
statement3
The awk branching structure allows nesting; for convenience in judgment and readability, multiple statements can be enclosed in {}.
Loop Statements#
- Three main loop statements
# while loop
while(expression){
statement
}
# for loop
# Format 1
for(initial variable; loop condition; loop increment/decrement statement){
statement
}
# Format 2
for(variable in array){
statement
}
# do-while loop
do{
statement
} while(condition)
Loop control statements:
break;
Exit the loopcontinue;
Skip the current loopexit status_code;
Theexit
statement is used to stop the execution of the script (if there isEND
, it transfers toEND
), accepts an integer parameter as the exit status code of theawk
process; if no parameter is provided, it defaults to 0 ($?
can be checked).
awk Arrays#
Arrays are the soul of awk, and array processing is often used in text processing.
awk
array characteristics:
- The index of
awk
arrays can be numbers or strings, thusawk
arrays are associative arrays. - Internally, the indexes of
awk
arrays are all strings; even numeric indexes are converted to strings when used. - The order of elements in
awk
arrays may not be the same as the order in which elements are inserted. - Arrays in
awk
do not need to be declared in advance, nor do they need to declare size. - Array elements are initialized to 0 or an empty string based on context.
Creating (Adding, Modifying) Arrays#
Syntax: array_name[index] = value
- The syntax for adding/modifying elements in an array is the same as creating an array.
Accessing Array Elements#
Syntax: array_name[index]
Deleting Array Elements#
Syntax: delete array_name[index]
- Deleting a non-existent element will not cause an error.
delete array_name
can directly delete all elements of the array.
Array-related Functions#
-
length(arr)
Get the length of the array. -
asort(arr)
Sort the array and return the length of the array. -
split(str,arr,sep)
Split a string into an array and return the length of the array.The generated awk array index starts from 1, which is different from C language arrays.
[root@localhost ~]# awk 'BEGIN{str="a,b,c,d";len=split(str,arr,",");print len,length(arr),asort(arr),arr[1]}'
4 4 4 a
Traversing Arrays#
# Method 1
[root@localhost ~]# awk 'BEGIN{
str="a,b,c,d";
len=split(str,arr,",");
for(i in arr){
print i,arr[i];
}
}'
1 a
2 b
3 c
4 d
# Method 2 (awk arrays are associative arrays, this method ensures ordered traversal)
[root@localhost ~]# awk 'BEGIN{
str="a,b,c,d";
len=split(str,arr,",");
for(i=1;i<=len;i++){
print i,arr[i];
}
}'
1 a
2 b
3 c
4 d
# Check if the array contains a certain key
if(key in arr)
# Check if arr is an array
isarray(arr) If arr is an array, returns 1, otherwise returns 0.
typeof(arr) Returns the data type; if arr is an array, returns 'array'.
Multi-dimensional Arrays#
- awk only supports one-dimensional arrays; we can use one-dimensional arrays to simulate multi-dimensional arrays.
# For a 3*3 two-dimensional array arr:
1 2 3
4 5 6
7 8 9
In C language, arr[0][0] = 100; in awk, we can set arr[0,0] = 100, and so on: arr[0,1], arr[0,2]...arr[3,3];
In fact, 0,1 0,2 3,3 are just string indexes.
awk Built-in Functions#
Mathematical Functions#
Function | Description |
---|---|
sin(expr) | Returns the sine of expr |
cos(expr) | Returns the cosine of expr |
atan2(y,x) | Returns the arctangent of y/x |
log(expr) | Returns the natural logarithm of expr |
exp(expr) | Returns the exponential value of expr with base e |
sqrt(expr) | Returns the square root of expr |
int(expr) | Returns the integer value of expr truncated |
rand() | Returns a random number n, where 0<=n<1 |
srand([expr]) | Sets the seed value of the rand function to the value of expr parameter; if omitted, uses the current time |
Example#
# Get a random integer between 0-99
[root@localhost ~]# awk 'BEGIN{srand();randint=int(100*rand());print randint}'
String Functions#
Function | Description |
---|---|
asort(arr [, d ]) | Sorts the values of array arr in ASCII order |
asorti(arr [, d]) | Sorts the keys of array arr in ASCII order |
gsub(regexp, sub, str) | Finds all occurrences of a specified pattern in a string and replaces them with another string |
sub(search, sub, str) | Finds a specified string in a string and replaces it with another string. Only replaces once |
index(str, sub) | Finds the position of a string within another string. Returns the position if found, otherwise returns 0 |
length(str) | Returns the length of a string |
match(str, regexp) | Finds the position of the first longest substring that matches the pattern. Returns 0 if not found, otherwise returns the starting position of the longest substring |
split(str, arr, regexp) | Splits a string into multiple substrings based on a given pattern. If no pattern is provided, uses the value of variable FS |
printf(format, expr-list) | Constructs a string based on the given format and passed variables and outputs it to standard output |
strtonum(str) | Checks if a string is a number and converts it to a decimal number |
substr(str, start, len) | Returns a substring of str starting from start with length len |
tolower(str) | Converts uppercase letters in the specified string to lowercase |
toupper(str) | Converts lowercase letters in the specified string to uppercase |
Example#
# asort(arr[,d]) arr-->array d-->array; if this parameter is passed, it will not modify arr, but will copy all elements from arr to d, and then sort d
[root@localhost ~]# awk 'BEGIN{
arr[11]=800;
arr[22]=200;
arr[33]=300;
arr[44]=100;
for(i in arr){
print i,arr[i];
}
asort(arr);
print;
for(j in arr){
print j,arr[j];
}
}'
11 800
22 200
33 300
44 100
1 100
2 200
3 300
4 800
# asorti
[root@localhost ~]# awk 'BEGIN{
arr[11]=800;
arr[22]=200;
arr[33]=300;
arr[44]=100;
for(i in arr){
print i,arr[i];
}
asorti(arr);
print;
for(j in arr){
print j,arr[j];
}
}'
11 800
22 200
33 300
44 100
1 11
2 22
3 33
4 44
# gsub(regexp, sub, str)
[root@localhost ~]# awk 'BEGIN{
str="hello world";
gsub("[o|l]","*",str);
print str;
}'
he*** w*r*d
# sub(search, sub, str)
[root@localhost ~]# awk 'BEGIN{
str="hello world";
sub("[o|l]","*",str);
print str;
}'
he*lo world
# index(str, sub)
[root@localhost ~]# awk 'BEGIN{
str="hello world";
idx=index(str,"l");
print idx;
}'
3
# match(str, regexp)
[root@localhost ~]# awk 'BEGIN{
str="hello world hi haaaaaa";
idx=match(str,"h*");
print idx;
}'
1
# strtonum(str)
[root@localhost ~]# awk 'BEGIN{
str="01010";
res=strtonum(str);
print res;
}'
520
# substr(str, start, len)
[root@localhost ~]# awk 'BEGIN{
str="hello world";
res=substr(str,7,5);
print res;
}'
world
# tolower(str)
[root@localhost ~]# awk 'BEGIN{
str="HAHAHA";
res=tolower(str);
print res;
}'
hahaha
# toupper(str)
[root@localhost ~]# awk 'BEGIN{
str="hello world";
res=toupper(str);
print res;
}'
HELLO WORLD
Date and Time Functions#
Function | Description |
---|---|
systime() | Returns the current timestamp |
mktime(datespec) | Converts a specified formatted time string (YYYY MM DD HH MM SS ) to a timestamp |
strftime([format [, timestamp]]) | Formats a timestamp according to the specified format string and returns it as a string representation |
Example#
# systime()
[root@localhost ~]# awk 'BEGIN{print systime()}'
1672448348
# mktime(datespec)
[root@localhost ~]# awk 'BEGIN{print mktime("2022 12 31 09 00 00")}'
1672448400
# strftime([format [, timestamp]])
[root@localhost ~]# awk 'BEGIN{print strftime("%c",systime())}'
Sat Dec 31 09:03:53 2022
Formatting#
Format Specifier | Description |
---|---|
%a | Localized weekday name, e.g., Thursday |
%A | Abbreviated localized weekday name, e.g., Thu |
%b | Localized month name, e.g., May |
%B | Localized full month name, e.g., May |
%c | Format %A %B %d %T %Y in C language, e.g., Thursday May 30 2019 21:08:37 |
%C | The century part of the current year, i.e., the first two digits of the four-digit year, e.g., 20 in 2019 |
%d | Day of the month, range 01-31 , e.g., 30 |
%D | Abbreviated format %m/%d/%y , e.g., 05/30/19 |
%e | Day of the month, range 1-31 , padded with space if less than 10 , e.g., 1 becomes 1 |
%F | Alias for the ISO 8601 date format %Y-%m-%d |
%g | The week number of the year divided by 100, range 00-99 , e.g., January 1, 1993 is the 53rd week of 1992 |
%G | The full year number in the ISO week date system, e.g., 2019 |
%h | Alias for format %b |
%H | Hour in 24-hour format, range 00–23 |
%I | Hour in 12-hour format, range 01–12 |
%j | Day of the year, range 001–366 |
%m | Month of the current time, range 01–12 |
%M | Minute of the current time, range 00–59 |
%n | Newline character \n |
%p | Localized AM or PM in 12-hour format, i.e., localized morning or afternoon representation |
%r | Localized 12-hour format time, similar to C language %I:%M:%S %p |
%R | Abbreviated format %H:%M |
%S | Current second, range 00-60 . 60 mainly considers leap seconds |
%t | Tab character \t |
%T | Abbreviated format %H:%M:%S |
%u | Day of the week, range 1–7 , starting from Monday |
%U | Week number of the year, range 00-53 , starting from the first Sunday |
%V | Week number of the year, range 01-53 , starting from the first Monday |
%w | Day of the week, range 0–6 , starting from Sunday |
%W | Week number of the year, range 00-53 , starting from the first Monday |
%x | Localized full date representation, similar to %A %B %d %Y , e.g., Thursday May 30 2019 |
%X | Localized full time representation, similar to C language %T , e.g., 07:06:05 |
%y | Two-digit year, i.e., the last two digits of the year, range 00-99 , e.g., 19 for 2019 |
%Y | Full four-digit year, e.g., 2019 |
%z | Time zone offset in +HHMM format. Part of RFC 822 or RFC 1036 date format. |
%Z | Time zone name or abbreviation. Returns empty string '' if no time zone. |
Other Functions#
Function | Description |
---|---|
close(expr) | Closes an already opened file or pipe |
system(command) | Executes a system script command and returns the exit status of the script |
getline | Reads the next line |
next | Processes the next line |
nextfile | Processes the next file |
Example#
# close(expr)
[root@localhost ~]# awk 'BEGIN{while("cat emp.txt" | getline){print $0}close("emp.txt")}'
# system(command)
[root@localhost ~]# awk 'BEGIN{system("ls -l")}'
# getline
[root@localhost ~]# awk '{getline;print}' emp.txt
002 Li Si 2000 10
004 Zhao Liu 2000 20
006 Xiao Li 800 20
[root@localhost ~]# awk 'BEGIN{print "Input:";getline name;print name}'
Input:
123
123
# next
[root@localhost ~]# awk '{if($3<2000)next;print}' emp.txt
002 Li Si 2000 10
003 Wang Wu 3000 10
004 Zhao Liu 2000 20
# nextfile
[root@localhost ~]# awk '{if($3==2000) nextfile;print}' emp.txt file
001 Zhang San 1000 10
xxx is a hanhan. ^_^
are you kidding?
I think ...
My phone number is 1872272****.
Custom Functions#
- Function Definition
function function_name(parameter1, parameter2, ...) {
function body
}
- Function names must start with a letter and can consist of letters, numbers, and underscores; reserved words cannot be used.
- Statements in the function body must be separated by semicolons.
- Functions can have return values or not; if a return value is needed, the
return
keyword must be used within the braces.
- Function Call
# Call a function without parameters
fun_name
# Call a function with parameters
fun_name(arg1[,arg2...])
# Call a function with return value
var = fun_name([arg1...])
Functions can be called in
BEGIN
,main
, andEND
sections.
Example#
# Function without parameters
[root@localhost ~]# awk 'BEGIN{
fun1()
}
function fun1(){
print "this is a function!"
}'
this is a function!
# Function with parameters
[root@localhost ~]# awk 'BEGIN{
fun2(10,20)
}
function fun2(num1,num2){
print "The result of the function is:"num1+num2
}'
The result of the function is:30
# Function with return value
[root@localhost ~]# awk 'BEGIN{
res = fun3(10,20);
print "The sum of num1 and num2 is:"res
}
function fun3(num1,num2){
return num1+num2
}'
The sum of num1 and num2 is:30