使用正则表达式使用sed匿名化Web日志中的电子邮件

时间:2016-03-01 07:24:44

标签: regex sed

如果有帮助我使用sed for windows http://gnuwin32.sourceforge.net/packages/sed.htm

我有看起来像这样的日志文件

205.200.253.76 6bTxPVZ2aOXEQ5C jamesbond.2015@business.my.emaildomain.com [01/Dec/2015:00:00:00 +0200] "GET http://Scopus.com.au:80/(S(vdkxl432vozr1dkpsqyoyfj1))/images/tabs-hover.png HTTP/1.1" 200 1164 "Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/46.0.2490.86 Safari/537.36"

我想通过将jamesbond.2015@business.my.emaildomain.com替换为xxx@business.my.emaildomain.com

来惹恼日志中的电子邮件地址

在Windows上使用sed我跑了

sed s/.*@business.my.emaildomain.com/xxx@business.my.emaildomain.com/ d:\input.txt > d:\output.txt

它会运行,但会替换205.200.253.76 6bTxPVZ2aOXEQ5Cxxxbusiness.my.emaildomain.com等所有内容。

如何保留前两个“单词”并只替换电子邮件?

我试过

sed s/\/b.*@business.my.emaildomain.com/xxx@business.my.emaildomain.com/ d:\input.txt > d:\output.txt

并没有任何机会。

1 个答案:

答案 0 :(得分:0)

您可以使用\b切换-r

的单词边界字符Extended Regex (ERE)
$ sed -r 's/\b[a-zA-Z0-9\._]+(@business.my.emaildomain.com)/xxxx\1/g'
205.200.253.76 6bTxPVZ2aOXEQ5C jamesbond.2015@business.my.emaildomain.com [01/Dec/2015:00:00:00 +0200] "GET http://Scopus.com.au:80/(S(vdkxl432vozr1dkpsqyoyfj1))/images/tabs-hover.png HTTP/1.1" 200 1164 "Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/46.0.2490.86 Safari/537.36"
205.200.253.76 6bTxPVZ2aOXEQ5C xxxx@business.my.emaildomain.com [01/Dec/2015:00:00:00 +0200] "GET http://Scopus.com.au:80/(S(vdkxl432vozr1dkpsqyoyfj1))/images/tabs-hover.png HTTP/1.1" 200 1164 "Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/46.0.2490.86 Safari/537.36"
$

注意:上面的正则表达式限制了电子邮件ID中的允许字符,即它必须只包含以下字符[a-zA-Z0-9._]。如果您希望接受任何电子邮件,则必须使用RFC投诉正则表达式(http://emailregex.com/)或最大允许的字符集(根据您的应用程序)。以上正则表达式有助于在一行的任何地方替换电子邮件地址。

如果您的日志格式已修复,您可以使用以下内容而无需担心电子邮件地址是否为RFC投诉。

$ sed -r 's/^([^\s]+\s[^\s]+\s).+(@business.my.emaildomain.com)/\1xxxx\2/g'
205.200.253.76 6bTxPVZ2aOXEQ5C jamesbond.2015@business.my.emaildomain.com [01/Dec/2015:00:00:00 +0200] "GET http://Scopus.com.au:80/(S(vdkxl432vozr1dkpsqyoyfj1))/images/tabs-hover.png HTTP/1.1" 200 1164 "Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/46.0.2490.86 Safari/537.36"
205.200.253.76 6bTxPVZ2aOXEQ5C xxxx@business.my.emaildomain.com [01/Dec/2015:00:00:00 +0200] "GET http://Scopus.com.au:80/(S(vdkxl432vozr1dkpsqyoyfj1))/images/tabs-hover.png HTTP/1.1" 200 1164 "Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/46.0.2490.86 Safari/537.36"
$