siwen

siwen

😉

awk Programming

awk Programming#

awk Operators#

Arithmetic Operators#

OperatorDescription
+ - * / %Addition Subtraction Multiplication Division Modulus
^ **Exponentiation
++ --Increment Decrement (can be used as prefix or suffix)

Assignment Operators#

OperatorDescription
= += -= *= /= %= ^= **=Assignment statement (a+=b is equivalent to a=a+b, others similar)

Relational Operators#

OperatorDescription
> < >= <= != ==Comparison statement (returns true if valid, false if not)

Logical Operators#

OperatorDescription
||Logical OR (true if any is true)
&&Logical AND (false if any is false)
!Logical NOT (true becomes false, false becomes true)

Regular Expression Operators#

OperatorDescription
~Matches regular expression
!~Does not match regular expression

Other Operators#

OperatorDescription
$Field reference
SpaceString concatenation operator
inArray member iteration operator (generally used with for loop to traverse arrays)
? :Ternary operator (like in C language: expression ? statement1 : statement2, executes statement1 if expression is true, otherwise executes statement2)

awk Control Flow Statements#

Conditional Statements#

# if
if(expression)
	statement
	
# if-else
if(expression)
	statement1
else
	statement2 
	
# if-else-if
if(expression)
	statement1 
else if (expression2)
	statement2 
else
	statement3 

The awk branching structure allows nesting; for convenience in judgment and readability, multiple statements can be enclosed in {}.

Loop Statements#

  • Three main loop statements
# while loop
while(expression){
	statement
}

# for loop
# Format 1
for(initial variable; loop condition; loop increment/decrement statement){
	statement
}

# Format 2 
for(variable in array){
	statement
}

# do-while loop
do{
	statement
} while(condition)

Loop control statements:

  • break; Exit the loop
  • continue; Skip the current loop
  • exit status_code; The exit statement is used to stop the execution of the script (if there is END, it transfers to END), accepts an integer parameter as the exit status code of the awk process; if no parameter is provided, it defaults to 0 ($? can be checked).

awk Arrays#

Arrays are the soul of awk, and array processing is often used in text processing.

awk array characteristics:

  • The index of awk arrays can be numbers or strings, thus awk arrays are associative arrays.
  • Internally, the indexes of awk arrays are all strings; even numeric indexes are converted to strings when used.
  • The order of elements in awk arrays may not be the same as the order in which elements are inserted.
  • Arrays in awk do not need to be declared in advance, nor do they need to declare size.
  • Array elements are initialized to 0 or an empty string based on context.

Creating (Adding, Modifying) Arrays#

Syntax: array_name[index] = value

  • The syntax for adding/modifying elements in an array is the same as creating an array.

Accessing Array Elements#

Syntax: array_name[index]

Deleting Array Elements#

Syntax: delete array_name[index]

  • Deleting a non-existent element will not cause an error.
  • delete array_name can directly delete all elements of the array.
  • length(arr) Get the length of the array.

  • asort(arr) Sort the array and return the length of the array.

  • split(str,arr,sep) Split a string into an array and return the length of the array.

    The generated awk array index starts from 1, which is different from C language arrays.

[root@localhost ~]# awk 'BEGIN{str="a,b,c,d";len=split(str,arr,",");print len,length(arr),asort(arr),arr[1]}'
4 4 4 a

Traversing Arrays#

# Method 1
[root@localhost ~]# awk 'BEGIN{
str="a,b,c,d";
len=split(str,arr,",");
for(i in arr){
print i,arr[i];
}
}'
1 a
2 b
3 c
4 d

# Method 2 (awk arrays are associative arrays, this method ensures ordered traversal)
[root@localhost ~]# awk 'BEGIN{
str="a,b,c,d";
len=split(str,arr,",");
for(i=1;i<=len;i++){
print i,arr[i];
}
}'
1 a
2 b
3 c
4 d

# Check if the array contains a certain key
if(key in arr)

# Check if arr is an array
isarray(arr)  If arr is an array, returns 1, otherwise returns 0.
typeof(arr)   Returns the data type; if arr is an array, returns 'array'.

Multi-dimensional Arrays#

  • awk only supports one-dimensional arrays; we can use one-dimensional arrays to simulate multi-dimensional arrays.
# For a 3*3 two-dimensional array arr:
1 2 3 
4 5 6
7 8 9
In C language, arr[0][0] = 100; in awk, we can set arr[0,0] = 100, and so on: arr[0,1], arr[0,2]...arr[3,3];
In fact, 0,1   0,2   3,3   are just string indexes.

awk Built-in Functions#

Mathematical Functions#

FunctionDescription
sin(expr)Returns the sine of expr
cos(expr)Returns the cosine of expr
atan2(y,x)Returns the arctangent of y/x
log(expr)Returns the natural logarithm of expr
exp(expr)Returns the exponential value of expr with base e
sqrt(expr)Returns the square root of expr
int(expr)Returns the integer value of expr truncated
rand()Returns a random number n, where 0<=n<1
srand([expr])Sets the seed value of the rand function to the value of expr parameter; if omitted, uses the current time

Example#

# Get a random integer between 0-99
[root@localhost ~]# awk 'BEGIN{srand();randint=int(100*rand());print randint}'

String Functions#

FunctionDescription
asort(arr [, d ])Sorts the values of array arr in ASCII order
asorti(arr [, d])Sorts the keys of array arr in ASCII order
gsub(regexp, sub, str)Finds all occurrences of a specified pattern in a string and replaces them with another string
sub(search, sub, str)Finds a specified string in a string and replaces it with another string. Only replaces once
index(str, sub)Finds the position of a string within another string. Returns the position if found, otherwise returns 0
length(str)Returns the length of a string
match(str, regexp)Finds the position of the first longest substring that matches the pattern. Returns 0 if not found, otherwise returns the starting position of the longest substring
split(str, arr, regexp)Splits a string into multiple substrings based on a given pattern. If no pattern is provided, uses the value of variable FS
printf(format, expr-list)Constructs a string based on the given format and passed variables and outputs it to standard output
strtonum(str)Checks if a string is a number and converts it to a decimal number
substr(str, start, len)Returns a substring of str starting from start with length len
tolower(str)Converts uppercase letters in the specified string to lowercase
toupper(str)Converts lowercase letters in the specified string to uppercase

Example#

# asort(arr[,d])  arr-->array d-->array; if this parameter is passed, it will not modify arr, but will copy all elements from arr to d, and then sort d
[root@localhost ~]# awk 'BEGIN{
    arr[11]=800;
    arr[22]=200;
    arr[33]=300;
    arr[44]=100;
	for(i in arr){
        print i,arr[i];
    }
    asort(arr);
    print;
    for(j in arr){
        print j,arr[j];
    }
}'
11 800
22 200
33 300
44 100

1 100
2 200
3 300
4 800

# asorti
[root@localhost ~]# awk 'BEGIN{
	arr[11]=800;
	arr[22]=200;
	arr[33]=300;
	arr[44]=100;
    for(i in arr){
        print i,arr[i];
    }
    asorti(arr);
    print;
    for(j in arr){
        print j,arr[j];
    }
}'
11 800
22 200
33 300
44 100

1 11
2 22
3 33
4 44

# gsub(regexp, sub, str)
[root@localhost ~]# awk 'BEGIN{
    str="hello world";
    gsub("[o|l]","*",str);
    print str;
}'
he*** w*r*d

# sub(search, sub, str)
[root@localhost ~]# awk 'BEGIN{
    str="hello world";
    sub("[o|l]","*",str);
    print str;
}'
he*lo world

# index(str, sub)
[root@localhost ~]# awk 'BEGIN{
    str="hello world";
    idx=index(str,"l");
    print idx;
}'
3

# match(str, regexp)
[root@localhost ~]# awk 'BEGIN{
    str="hello world hi haaaaaa";
    idx=match(str,"h*");
    print idx;
}'
1

# strtonum(str)
[root@localhost ~]# awk 'BEGIN{
    str="01010";
    res=strtonum(str);
    print res;
}'
520

# substr(str, start, len)
[root@localhost ~]# awk 'BEGIN{
    str="hello world";
    res=substr(str,7,5);
    print res;
}'
world

# tolower(str)
[root@localhost ~]#  awk 'BEGIN{
    str="HAHAHA";
    res=tolower(str);
    print res;
}'
hahaha

# toupper(str)
[root@localhost ~]# awk 'BEGIN{
    str="hello world";
    res=toupper(str);
    print res;
}'
HELLO WORLD

Date and Time Functions#

FunctionDescription
systime()Returns the current timestamp
mktime(datespec)Converts a specified formatted time string (YYYY MM DD HH MM SS) to a timestamp
strftime([format [, timestamp]])Formats a timestamp according to the specified format string and returns it as a string representation

Example#

# systime()
[root@localhost ~]# awk 'BEGIN{print systime()}'
1672448348

# mktime(datespec)
[root@localhost ~]# awk 'BEGIN{print mktime("2022 12 31 09 00 00")}'
1672448400

# strftime([format [, timestamp]])
[root@localhost ~]# awk 'BEGIN{print strftime("%c",systime())}'
Sat Dec 31 09:03:53 2022

Formatting#

Format SpecifierDescription
%aLocalized weekday name, e.g., Thursday
%AAbbreviated localized weekday name, e.g., Thu
%bLocalized month name, e.g., May
%BLocalized full month name, e.g., May
%cFormat %A %B %d %T %Y in C language, e.g., Thursday May 30 2019 21:08:37
%CThe century part of the current year, i.e., the first two digits of the four-digit year, e.g., 20 in 2019
%dDay of the month, range 01-31, e.g., 30
%DAbbreviated format %m/%d/%y, e.g., 05/30/19
%eDay of the month, range 1-31, padded with space if less than 10, e.g., 1 becomes 1
%FAlias for the ISO 8601 date format %Y-%m-%d
%gThe week number of the year divided by 100, range 00-99, e.g., January 1, 1993 is the 53rd week of 1992
%GThe full year number in the ISO week date system, e.g., 2019
%hAlias for format %b
%HHour in 24-hour format, range 00–23
%IHour in 12-hour format, range 01–12
%jDay of the year, range 001–366
%mMonth of the current time, range 01–12
%MMinute of the current time, range 00–59
%nNewline character \n
%pLocalized AM or PM in 12-hour format, i.e., localized morning or afternoon representation
%rLocalized 12-hour format time, similar to C language %I:%M:%S %p
%RAbbreviated format %H:%M
%SCurrent second, range 00-60. 60 mainly considers leap seconds
%tTab character \t
%TAbbreviated format %H:%M:%S
%uDay of the week, range 1–7, starting from Monday
%UWeek number of the year, range 00-53, starting from the first Sunday
%VWeek number of the year, range 01-53, starting from the first Monday
%wDay of the week, range 0–6, starting from Sunday
%WWeek number of the year, range 00-53, starting from the first Monday
%xLocalized full date representation, similar to %A %B %d %Y, e.g., Thursday May 30 2019
%XLocalized full time representation, similar to C language %T, e.g., 07:06:05
%yTwo-digit year, i.e., the last two digits of the year, range 00-99, e.g., 19 for 2019
%YFull four-digit year, e.g., 2019
%zTime zone offset in +HHMM format. Part of RFC 822 or RFC 1036 date format.
%ZTime zone name or abbreviation. Returns empty string '' if no time zone.

Other Functions#

FunctionDescription
close(expr)Closes an already opened file or pipe
system(command)Executes a system script command and returns the exit status of the script
getlineReads the next line
nextProcesses the next line
nextfileProcesses the next file

Example#

# close(expr) 
[root@localhost ~]# awk 'BEGIN{while("cat emp.txt" | getline){print $0}close("emp.txt")}'

# system(command)
[root@localhost ~]# awk 'BEGIN{system("ls -l")}'

# getline
[root@localhost ~]# awk '{getline;print}' emp.txt
002 Li Si 2000 10
004 Zhao Liu 2000 20
006 Xiao Li 800 20

[root@localhost ~]# awk 'BEGIN{print "Input:";getline name;print name}'
Input:
123
123

# next
[root@localhost ~]# awk '{if($3<2000)next;print}' emp.txt
002 Li Si 2000 10
003 Wang Wu 3000 10
004 Zhao Liu 2000 20

# nextfile
[root@localhost ~]# awk '{if($3==2000) nextfile;print}' emp.txt file
001 Zhang San 1000 10
xxx is a hanhan. ^_^

are you kidding?
I think ...
My phone number is 1872272****.

Custom Functions#

  • Function Definition
function function_name(parameter1, parameter2, ...) { 
   function body
}
  • Function names must start with a letter and can consist of letters, numbers, and underscores; reserved words cannot be used.
  • Statements in the function body must be separated by semicolons.
  • Functions can have return values or not; if a return value is needed, the return keyword must be used within the braces.
  • Function Call
# Call a function without parameters
fun_name 
# Call a function with parameters
fun_name(arg1[,arg2...])
# Call a function with return value
var = fun_name([arg1...])

Functions can be called in BEGIN, main, and END sections.

Example#

# Function without parameters
[root@localhost ~]# awk 'BEGIN{
	fun1()
}
function fun1(){
	print "this is a function!"
}'
this is a function!

# Function with parameters
[root@localhost ~]# awk 'BEGIN{
	fun2(10,20)
}
function fun2(num1,num2){
	print "The result of the function is:"num1+num2
}'
The result of the function is:30

# Function with return value
[root@localhost ~]# awk 'BEGIN{
	res = fun3(10,20);
	print "The sum of num1 and num2 is:"res
}
function fun3(num1,num2){
	return num1+num2
}'
The sum of num1 and num2 is:30
Loading...
Ownership of this post data is guaranteed by blockchain and smart contracts to the creator alone.