linux正則表達式

linux 正則表達式#

Linux 三劍客#

grep 文本過濾工具（過濾，查找文本內容）
sed stream editor 流編輯器文本編輯工具（取行，修改文件內容）
awk 文本分析工具格式化文本輸出（取列，統計計算）

regual expression regexp

此處使用 grep 命令來學習正則表達式（grep 命令可過濾匹配模式的內容）

grep 命令基本語法：grep pattern filename pattern 是匹配的模式

linux 通配符和正則表達式#

通配符是對文件進行匹配的；由 shell 解析，如ls、cp、mv、find等命令
正則表達式是對文件內容進行匹配的；正則表達式一般結合 grep、sed、awk使用

常見通配符

符號	描述
*	匹配任意長度的任意字符
?	匹配任意單個字符
[]	匹配指定範圍內任意單個字符
[^]	匹配指定範圍外任意單個字符
[[:upper:]]	所有大寫字母，等價於 [A-Z]
[[:lower:]]	所有小寫字母，等價於 [a-z]
[[:alpha:]]	所有字母，等價於 [a-zA-Z]
[[:digit:]]	所有數字，等價於 [0-9]
[[:alnum:]]	所有數字和字母，等價於 [0-9a-zA-Z]
[[:space:]]	所有空白字符
[[:punct:]]	所有標點符號

[0-9] 表示任意一個數字
[a-z] 表示任意一個小寫字母
[A-Z] 表示任意一個大寫字母
[0-9a-zA-Z] 表示任意一個數字或字母

文件 test.txt 內容如下:

[root@localhost ~]# cat test.txt 
You and me.
xxx is a hanhan. ^_^
longzhaoqianwowudunjiu.

He can speak english.
Are you kidding?

I think ...
youoy abba ccccc ddd
My phone number is 1872272****.
 vvv

通配符示例#

# 查找當前目錄下以數字命名的文件
[root@localhost ~]# find . -name [0-9]
[root@localhost ~]# find . -name [[:digit:]]

# 查找test.txt中包含數字的內容
[root@localhost ~]# grep '[[:digit:]]' test.txt

# 查找test.txt中標點符號以外的內容
[root@localhost ~]# grep '[^[:punct:]]' test.txt

基礎元字符#

元字符	含義
^	^a 以 a 開頭的內容
$	a$ 以 a 結尾的內容
^$	空行（在 linux 的文本中，每一行的末尾會有默認的 $ 符號使用`cat -E file`可以看到）
.	任意一個字符（非空行）
\	轉義字符，讓有特殊含義的字符脫掉馬甲
*	之前的字符連續 0 次或多次
.*	任意多個字符（匹配全部內容）
^.*	以任意多個字符串開頭，具有貪婪性
[ab]	包含中括號中的任意一個字符（a 或 b）
[^ab]	不包含 ^ 後的任意字符（a 或 b），對 [ab] 的取反
\<	詞首
\>	詞尾
\{n\}	重複前面字符 n 次
\{n,\}	重複前面字符最少 n 次
\{,m\}	重複前面字符最多 m 次
\{n,m\}	重複前面字符 n 次到 m 次（最少 n 次，最多 m 次）

示例#

# 查找所有以'Y'開頭的行
[root@localhost ~]# grep '^Y' test.txt 

# 查找以'g'結尾的行 
[root@localhost ~]# grep 'g$' test.txt 

# 查找所有空行
[root@localhost ~]# grep '^$' test.txt

# 查找非空行 
[root@localhost ~]# grep '.' test.txt 

# 查找以'.'結尾的行
[root@localhost ~]# grep '\.$' test.txt 

# 查找連續出現0個或多個d的內容
[root@localhost ~]# grep 'd*' test.txt 

# 查找全部內容
[root@localhost ~]# grep '.*' test.txt 

# 以任意字符串開頭並且包含d的內容 （貪婪匹配，會匹配到每行文本的最後一個d）
[root@localhost ~]# grep '^.*d' test.txt 

# 匹配l或x
[root@localhost ~]# grep '[lx]' test.txt 

# 不匹配l和x
[root@localhost ~]# grep '[^lx]' test.txt 

# 匹配l或x開頭
[root@localhost ~]# grep '^[lx]' test.txt 

# 匹配單詞speak
[root@localhost ~]# grep '\<speak\>' test.txt

# 匹配空格開頭的內容
[root@localhost ~]# grep '^[[:space:]]' test.txt
[root@localhost ~]# grep '^[ ]' test.txt

擴展元字符#

元字符	含義
+	重複前一個字符 1 次或多次（至少 1 次），取出連續的字符或文本
?	重複前一個字符 0 次或 1 次（最多 1 次）
\|	表示或者同時過濾多個字符
()	分組，將（）裡的內容當成一個整體，\n（n 是一個數字）表示引用第幾個括號裡的內容

示例#

在基本表達式中，擴展正則表達式需要在前面使用 \ 進行轉義
使用 egrep 或 grep -E 來使用擴展正則表達式不需要使用 \ 轉義

# 包含連續一個或多個d  
[root@localhost ~]# grep 'd\+' test.txt 
[root@localhost ~]# egrep 'd+' test.txt 
[root@localhost ~]# grep -E 'd+' test.txt 

# 包含0次或1次d 
[root@localhost ~]# grep -E 'd?' test.txt 

# 匹配a或b
[root@localhost ~]# grep -E 'a|b' test.txt

# 匹配 'and' 或 'abb'
[root@localhost ~]# grep -E 'a(nd|bb)' test.txt

# 匹配兩個相同的字母
[root@localhost ~]# grep -E '([a-z])\1' test.txt

# 重複d字符最少1次，最多2次
[root@localhost ~]# grep -E 'd{1,2}' test.txt

擴展：perl 支持的其他常用元字符#

元字符	解釋
\d	數字
\D	非數字
\w	數字，字母，下劃線
\W	非數字、字母、下劃線
\s	空字符
\S	非空字符

使用 grep -P 可以支持 perl 正則表達式

示例#

# 匹配所有的單詞
[root@localhost ~] # grep -P '\w+' test.txt

# 匹配所有的非數字
[root@localhost ~] # grep -P '\D' test.txt