Question

我有超过一百个日志文件，在每个文件中，我有0到20行，格式如下：

[2016-06-08 18:12:32] production.INFO：成功完成了一些事情。电子邮件：foo@bar.com [] {＆＃34;使用者＆＃34;：＆＃34;匿名＆＃34;＆＃34; URL＆＃34;：＆＃34; /东西＆＃34;＆＃34; IP＆＃34;：＆＃34 ; 77.46.189.212＆＃34;＆＃34; http_method＆＃34;：＆＃34; POST＆＃34;＆＃34;服务器＆＃34;：＆＃34; www.mysite.com＆＃34;，＆＃34;引荐＆＃34;：＆＃34; www.mysite.com/something"，＆＃34; UNIQUE_ID＆＃34;：＆＃34; V1hD7lJ10JkAAAQ7MgsAAAAa＆＃34;}

如何使用grep从这些文件中提取所有电子邮件，但只取出电子邮件前面的那些行Successfully done something. Email:？

换句话说，应该忽略来自下一行的电子邮件：

[2016-06-08 17:13:29] production.INFO：用户another@email.com登录出...... [] {＆＃34;用户＆＃34;：＆＃34; another@email.com"，＆＃34; URL＆＃34;：＆＃34; /管理/注销＆＃34;＆＃34; IP＆＃ 34;：＆＃34; 109.92.131.202＆＃34;＆＃34; http_method＆＃34;：＆＃34; GET＆＃34;＆＃34;服务器＆＃34;：＆＃34; mysite.com＆＃ 34;，＆＃34;引荐＆＃34;：＆＃34; www.mysite.com/admin/foo"，＆＃34; UNIQUE_ID＆＃34;：＆＃34; V1g2GVJ10JkAAAqy42gAAABH＆＃34;}

从给定示例中，我想提取foo@bar.com并忽略another@email.com。

Answer 1

如果您的grep版本支持perl正则表达式，那么您可以尝试类似的东西

grep -r 'Successfully done something. Email:' /path/to/logs/ | grep -oP '\S+@\S+'

您可能希望将第二个grep中的正则表达式替换为更好的正则表达式。

您也可以使用单个grep：

grep -roP 'Successfully done something. Email: \K\S+@\S+' /path/to/logs/'

Answer 2

您可以使用awk检查“成功完成某项操作”。行发生，它还包含一封电子邮件：

awk '/Successfully done something. Email:/ && \  # match line
     match($0, /Email: ([^ ]*) /, matches) {     # match up to space
         print matches[1]                        # print captured group
     }' file

根据您提供的数据：

$ awk '/Successfully done something. Email:/ && match($0, /Email: ([^ ]*) /, matches) {print matches[1]}' file
foo@bar.com

如何使用grep从日志中提取电子邮件

2 个答案: