Regular Expressions in Linux#
Linux Three Musketeers#
grep
Text filtering tool (filtering, searching for text content)sed
Stream editor, text editing tool (selecting lines, modifying file content)awk
Text analysis tool, formatting text output (selecting columns, statistical calculations)
regular expression regexp
Here, we use the grep
command to learn regular expressions (the grep
command can filter content that matches a pattern).
Basic syntax of the
grep
command:grep pattern filename
where pattern is the matching pattern.
Linux Wildcards and Regular Expressions#
- Wildcards are used to match files; they are interpreted by the shell, such as
ls
,cp
,mv
,find
, etc. - Regular expressions are used to match file content; regular expressions are generally used with
grep
,sed
,awk
.
Common Wildcards
Symbol | Description |
---|---|
* | Matches any number of any characters |
? | Matches any single character |
[] | Matches any single character in the specified range |
[^] | Matches any single character not in the specified range |
[[:upper:]] | All uppercase letters, equivalent to [A-Z] |
[[:lower:]] | All lowercase letters, equivalent to [a-z] |
[[:alpha:]] | All letters, equivalent to [a-zA-Z] |
[[:digit:]] | All digits, equivalent to [0-9] |
[[:alnum:]] | All digits and letters, equivalent to [0-9a-zA-Z] |
[[:space:]] | All whitespace characters |
[[:punct:]] | All punctuation marks |
[0-9] represents any digit
[a-z] represents any lowercase letter
[A-Z] represents any uppercase letter
[0-9a-zA-Z] represents any digit or letter
The content of the file test.txt is as follows:
[root@localhost ~]# cat test.txt
You and me.
xxx is a hanhan. ^_^
longzhaoqianwowudunjiu.
He can speak english.
Are you kidding?
I think ...
youoy abba ccccc ddd
My phone number is 1872272****.
vvv
Wildcard Examples#
# Find files in the current directory with numeric names
[root@localhost ~]# find . -name [0-9]
[root@localhost ~]# find . -name [[:digit:]]
# Find content in test.txt that contains numbers
[root@localhost ~]# grep '[[:digit:]]' test.txt
# Find content in test.txt that is not punctuation
[root@localhost ~]# grep '[^[:punct:]]' test.txt
Basic Metacharacters#
Metacharacter | Meaning |
---|---|
^ | ^a Content that starts with a |
$ | a$ Content that ends with a |
^$ | Empty line (in Linux text, each line will have a default $ symbol, you can see it using cat -E file ) |
. | Any single character (except for empty lines) |
\ | Escape character, removes the special meaning of the character |
* | The previous character is repeated 0 or more times |
.* | Any number of characters (matches all content) |
^.* | Starts with any number of strings, greedy matching |
[ab] | Matches any one character in the square brackets (a or b) |
[^ab] | Matches any character not following the ^ (a or b), negation of [ab] |
\< | Word beginning |
\> | Word end |
\{n\} | Repeat the previous character n times |
\{n,\} | Repeat the previous character at least n times |
\{,m\} | Repeat the previous character at most m times |
\{n,m\} | Repeat the previous character from n to m times (at least n times, at most m times) |
Examples#
# Find all lines starting with 'Y'
[root@localhost ~]# grep '^Y' test.txt
# Find lines ending with 'g'
[root@localhost ~]# grep 'g$' test.txt
# Find all empty lines
[root@localhost ~]# grep '^$' test.txt
# Find non-empty lines
[root@localhost ~]# grep '.' test.txt
# Find lines ending with '.'
[root@localhost ~]# grep '\.$' test.txt
# Find content with 0 or more consecutive d's
[root@localhost ~]# grep 'd*' test.txt
# Find all content
[root@localhost ~]# grep '.*' test.txt
# Starts with any string and contains d (greedy matching, will match the last d of each line)
[root@localhost ~]# grep '^.*d' test.txt
# Match l or x
[root@localhost ~]# grep '[lx]' test.txt
# Does not match l or x
[root@localhost ~]# grep '[^lx]' test.txt
# Match lines starting with l or x
[root@localhost ~]# grep '^[lx]' test.txt
# Match the word 'speak'
[root@localhost ~]# grep '\<speak\>' test.txt
# Match content starting with a space
[root@localhost ~]# grep '^[[:space:]]' test.txt
[root@localhost ~]# grep '^[ ]' test.txt
Extended Metacharacters#
Metacharacter | Meaning |
---|---|
+ | The previous character is repeated 1 or more times (at least 1 time), extracts consecutive characters or text |
? | The previous character is repeated 0 or 1 time (at most 1 time) |
| | Represents or, simultaneously filters multiple characters |
() | Grouping, treats the content inside () as a whole, \n (n is a number) represents the reference to the content inside the nth parentheses |
Examples#
In basic expressions, escaping is required for extended regular expressions using \ in front
Useegrep
orgrep -E
to use extended regular expressions without escaping with \
# Match one or more consecutive d's
[root@localhost ~]# grep 'd\+' test.txt
[root@localhost ~]# egrep 'd+' test.txt
[root@localhost ~]# grep -E 'd+' test.txt
# Match 0 or 1 d
[root@localhost ~]# grep -E 'd?' test.txt
# Match a or b
[root@localhost ~]# grep -E 'a|b' test.txt
# Match 'and' or 'abb'
[root@localhost ~]# grep -E 'a(nd|bb)' test.txt
# Match two identical letters
[root@localhost ~]# grep -E '([a-z])\1' test.txt
# Repeat the d character at least 1 time, at most 2 times
[root@localhost ~]# grep -E 'd{1,2}' test.txt
Extension: Other commonly used metacharacters supported by Perl#
Metacharacter | Explanation |
---|---|
\d | Digit |
\D | Non-digit |
\w | Digit, letter, underscore |
\W | Non-digit, letter, underscore |
\s | Whitespace character |
\S | Non-whitespace character |
Use grep -P to support Perl regular expressions
Examples#
# Match all words
[root@localhost ~] # grep -P '\w+' test.txt
# Match all non-digits
[root@localhost ~] # grep -P '\D' test.txt