siwen

siwen

๐Ÿ˜‰

awk command

awk Text Analysis Tool#

Description:

awk is a programming language used for text and data processing in Linux/Unix. The data can come from standard input (stdin), one or more files, or the output of other commands. It supports advanced features such as user-defined functions and dynamic regular expressions, making it a powerful programming tool in Linux/Unix. It is used in the command line, but is more commonly used as a script. awk has many built-in features, such as arrays and functions, which are similar to C language. Flexibility is the greatest advantage of awk.

Syntax:

awk [option] 'PATTERN{ACTION STATEMENTS}' FILE

awk Execution Process#

image

awk Program Structure#

StructureExplanation
BEGIN{ awk-commands }Optional, the content inside is executed before awk reads the file
/pattern/{ awk-commands }Optional, this part of the code is executed once for each input line and can filter input lines with /pattern/
END{ awk-commands }Optional, the content inside is executed after awk finishes reading the file

Example#

# Prepare the following text content
[root@localhost ~]# cat emp.txt 
001 Zhang San 1000 10
002 Li Si 2000 10
003 Wang Wu 3000 10
004 Zhao Liu 2000 20
005 Xiao Hong 1800 30
006 Xiao Li 800  20

# Use awk command to output the text content
[root@localhost ~]# awk '{print}' emp.txt

# Output the text content with a header and the end of the table (-------------------)
[root@localhost ~]# awk 'BEGIN{print "ID Name Salary Department"}{print}END{print "-------------------"}' emp.txt
ID Name Salary Department
001 Zhang San 1000 10
002 Li Si 2000 10
003 Wang Wu 3000 10
004 Zhao Liu 2000 20
005 Xiao Hong 1800 30
006 Xiao Li 800  20
-------------------

awk Command Options#

OptionMeaning
-F fsSpecifies the input field separator (default is whitespace), fs can be a string or a regular expression
-v var=valueAssigns a user-defined variable
-f scriptfileReads awk commands from a script file

Usage of awk:

  • Command line usage

Execute the awk command directly on the command line

awk '{print}' emp.txt
  • Usage of awk script

Write the awk code in a file (usually with the extension .awk) and execute it using the -f option

awk -f file.awk emp.txt
  • Usage in shell script

Write the awk command in a shell script and execute the shell script

awk Patterns and Actions#

PatternMeaning
/pattern/Regular expression
Relational expressionOperate with operators
Pattern matching expression~ (matches) !~ (does not match)

Actions can consist of one or more commands, functions, or expressions, separated by newlines or semicolons
awk regular ranges:

  • /start/,/end/
  • NR==1,NR==5 from line 1 to line 5

Example#

# Output lines in emp.txt that contain the character 'Xiao'
[root@localhost ~]# awk '/Xiao/' emp.txt

# Output from the line containing 'Zhang' to the line containing 'Zhao'
[root@localhost ~]# awk '/Zhang/,/Zhao/' emp.txt

# Output the first three lines
[root@localhost ~]# awk 'NR<=3' emp.txt
[root@localhost ~]# awk 'NR==1,NR==3' emp.txt
[root@localhost ~]# awk 'NR>=1 && NR<=3' emp.txt

# Output if the pattern matches
[root@localhost ~]# awk '666 ~ /[0-9]+/' emp.txt

When using patterns, if you only want to output lines, the awk statement part can be omitted

awk Variables#

Built-in Variables (Predefined Variables)#

VariableMeaning
$0The content of the current line during execution
$nThe nth field of the current record (nth column after splitting)
NFThe number of fields in the current line, $NF represents the last field in a line
NRThe line number of the current line
FSThe input field separator (default is whitespace, can be specified using -v)
OFSThe output field separator (default is a space, can be specified using -v)
RSThe input record separator (default is a newline, can be specified using -v)
ORSThe output record separator (default is a newline, can be specified using -v)
FILENAMEThe file name of the current file

User-defined Variables#

  • Defined using the -v option
-v var=value
  • Defined in the awk program
awk 'BEGIN{var=value}'

Passing External Variables#

  • The -v option can be used to pass an external variable to the awk command

  • The variable can also be passed by using var=value at the end of the command

Example#

# Print the content, line number, and number of fields of each line in emp.txt
[root@localhost ~]# awk '{print $0,NR,NF}' emp.txt

# Print the content of the first column, the last column, and the second-to-last column
[root@localhost ~]# awk '{print $1,$NF,$(NF-1)}' emp.txt

# Output the content in the order of "Department ID Name Salary"
[root@localhost ~]# awk '{print $4,$1,$2,$3}' emp.txt

# Output the second field of the third line
[root@localhost ~]# awk 'NR==3{print $2}' emp.txt

# Count the number of people in each department
[root@localhost ~]# awk '{print $NF}' emp.txt | uniq -c

# Change the output field separator to ':' and write the output to emp.txt.bak file
[root@localhost ~]# awk -v OFS=":" '{print $1,$2,$3,$4}' emp.txt > emp.bak.txt
[root@localhost ~]# awk -v OFS=":" 'NF=NF' emp.txt > emp.bak.txt

# For each line of emp.txt.bak file, output: "ID: xxx, Name: xxx, Salary: xxx"
[root@localhost ~]# awk -F: '{print "ID:"$1", Name:"$2", Salary:"$3}' emp.bak.txt

# User-defined variables
# Method 1
[root@localhost ~]# echo | awk -v a=100 '{print a}'
# Method 2
[root@localhost ~]# echo | awk 'BEGIN{b=100;print b}'

# Passing external variables
[root@localhost ~]# a='aaa'
[root@localhost ~]# b='bbb'
# Method 1
[root@localhost ~]# echo | awk -v v1=$a -v v2=$b '{print v1,v2}'
# Method 2
[root@localhost ~]# echo | awk '{print v1,v2}' v1=$a v2=$b
  • When the parameters of print are separated by commas, the results are automatically separated by the output field separator, and no newline character needs to be added;
    - printf can control the output format of a specific field, and a newline character needs to be added manually

Common format specifiers:

SymbolExplanation
%sString
%fFloating point
%dDecimal integer
%cASCII character
%%% itself

Common escape characters:

SymbolExplanation
\nNewline
\tHorizontal tab
\vVertical tab
\rCarriage return
\bBackspace
\fForm feed
\aAlert character
\\\ itself
Loading...
Ownership of this post data is guaranteed by blockchain and smart contracts to the creator alone.