Regular expressions in Linux

Regular Expressions in Linux#

Linux Three Musketeers#

grep Text filtering tool (filtering, searching for text content)
sed Stream editor, text editing tool (selecting lines, modifying file content)
awk Text analysis tool, formatting text output (selecting columns, statistical calculations)

regular expression regexp

Here, we use the grep command to learn regular expressions (the grep command can filter content that matches a pattern).

Basic syntax of the grep command: grep pattern filename where pattern is the matching pattern.

Linux Wildcards and Regular Expressions#

Wildcards are used to match files; they are interpreted by the shell, such as ls, cp, mv, find, etc.
Regular expressions are used to match file content; regular expressions are generally used with grep, sed, awk.

Common Wildcards

Symbol	Description
*	Matches any number of any characters
?	Matches any single character
[]	Matches any single character in the specified range
[^]	Matches any single character not in the specified range
[[:upper:]]	All uppercase letters, equivalent to [A-Z]
[[:lower:]]	All lowercase letters, equivalent to [a-z]
[[:alpha:]]	All letters, equivalent to [a-zA-Z]
[[:digit:]]	All digits, equivalent to [0-9]
[[:alnum:]]	All digits and letters, equivalent to [0-9a-zA-Z]
[[:space:]]	All whitespace characters
[[:punct:]]	All punctuation marks

[0-9] represents any digit
[a-z] represents any lowercase letter
[A-Z] represents any uppercase letter
[0-9a-zA-Z] represents any digit or letter

The content of the file test.txt is as follows:

[root@localhost ~]# cat test.txt 
You and me.
xxx is a hanhan. ^_^
longzhaoqianwowudunjiu.

He can speak english.
Are you kidding?

I think ...
youoy abba ccccc ddd
My phone number is 1872272****.
 vvv

Wildcard Examples#

# Find files in the current directory with numeric names
[root@localhost ~]# find . -name [0-9]
[root@localhost ~]# find . -name [[:digit:]]

# Find content in test.txt that contains numbers
[root@localhost ~]# grep '[[:digit:]]' test.txt

# Find content in test.txt that is not punctuation
[root@localhost ~]# grep '[^[:punct:]]' test.txt

Basic Metacharacters#

Metacharacter	Meaning
^	^a Content that starts with a
$	a$ Content that ends with a
^$	Empty line (in Linux text, each line will have a default $ symbol, you can see it using `cat -E file`)
.	Any single character (except for empty lines)
\	Escape character, removes the special meaning of the character
*	The previous character is repeated 0 or more times
.*	Any number of characters (matches all content)
^.*	Starts with any number of strings, greedy matching
[ab]	Matches any one character in the square brackets (a or b)
[^ab]	Matches any character not following the ^ (a or b), negation of [ab]
\<	Word beginning
\>	Word end
\{n\}	Repeat the previous character n times
\{n,\}	Repeat the previous character at least n times
\{,m\}	Repeat the previous character at most m times
\{n,m\}	Repeat the previous character from n to m times (at least n times, at most m times)

Examples#

# Find all lines starting with 'Y'
[root@localhost ~]# grep '^Y' test.txt 

# Find lines ending with 'g'
[root@localhost ~]# grep 'g$' test.txt 

# Find all empty lines
[root@localhost ~]# grep '^$' test.txt

# Find non-empty lines
[root@localhost ~]# grep '.' test.txt 

# Find lines ending with '.'
[root@localhost ~]# grep '\.$' test.txt 

# Find content with 0 or more consecutive d's
[root@localhost ~]# grep 'd*' test.txt 

# Find all content
[root@localhost ~]# grep '.*' test.txt 

# Starts with any string and contains d (greedy matching, will match the last d of each line)
[root@localhost ~]# grep '^.*d' test.txt 

# Match l or x
[root@localhost ~]# grep '[lx]' test.txt 

# Does not match l or x
[root@localhost ~]# grep '[^lx]' test.txt 

# Match lines starting with l or x
[root@localhost ~]# grep '^[lx]' test.txt 

# Match the word 'speak'
[root@localhost ~]# grep '\<speak\>' test.txt

# Match content starting with a space
[root@localhost ~]# grep '^[[:space:]]' test.txt
[root@localhost ~]# grep '^[ ]' test.txt

Extended Metacharacters#

Metacharacter	Meaning
+	The previous character is repeated 1 or more times (at least 1 time), extracts consecutive characters or text
?	The previous character is repeated 0 or 1 time (at most 1 time)
\|	Represents or, simultaneously filters multiple characters
()	Grouping, treats the content inside () as a whole, \n (n is a number) represents the reference to the content inside the nth parentheses

Examples#

In basic expressions, escaping is required for extended regular expressions using \ in front
Use egrep or grep -E to use extended regular expressions without escaping with \

# Match one or more consecutive d's
[root@localhost ~]# grep 'd\+' test.txt 
[root@localhost ~]# egrep 'd+' test.txt 
[root@localhost ~]# grep -E 'd+' test.txt 

# Match 0 or 1 d
[root@localhost ~]# grep -E 'd?' test.txt 

# Match a or b
[root@localhost ~]# grep -E 'a|b' test.txt

# Match 'and' or 'abb'
[root@localhost ~]# grep -E 'a(nd|bb)' test.txt

# Match two identical letters
[root@localhost ~]# grep -E '([a-z])\1' test.txt

# Repeat the d character at least 1 time, at most 2 times
[root@localhost ~]# grep -E 'd{1,2}' test.txt

Extension: Other commonly used metacharacters supported by Perl#

Metacharacter	Explanation
\d	Digit
\D	Non-digit
\w	Digit, letter, underscore
\W	Non-digit, letter, underscore
\s	Whitespace character
\S	Non-whitespace character

Use grep -P to support Perl regular expressions

Examples#

# Match all words
[root@localhost ~] # grep -P '\w+' test.txt

# Match all non-digits
[root@localhost ~] # grep -P '\D' test.txt