我正在尝试使用bash中的正则表达式匹配一些电子邮件地址。 目前得到了表达
"^[a-zA-Z0-9!#\$%&'\*\+/=?^_\`{|}~-]+(\.[a-zA-Z0-9!#\$%&'\*\+/=?^_\`{|}~-]+)*@([a-zA-Z0-9]([a-zA-Z0-9-]*[a-zA-Z0-9])?\.)+[a-zA-Z0-9]([a-zA-Z0-9-]*[a-zA-Z0-9])?\$"
哪个成功匹配我需要的所有电子邮件,但是当尝试添加“收件人:”字段时,我似乎无法获得任何匹配,我不知道为什么。 这是我的代码,带有To字段。
"^To:\s[a-zA-Z0-9!#\$%&'\*\+/=?^_\`{|}~-]+(\.[a-zA-Z0-9!#\$%&'\*\+/=?^_\`{|}~-]+)*@([a-zA-Z0-9]([a-zA-Z0-9-]*[a-zA-Z0-9])?\.)+[a-zA-Z0-9]([a-zA-Z0-9-]*[a-zA-Z0-9])?\$"
哪个AFAIK应与“To:bob@bob.co.uk”匹配,但不是:( 有什么建议吗?
代码示例
Reply-To: "service@paypal.com" <service@paypal.com>
To: bob@bob.co.uk
Date: Mon, 21 Jun 2012 21:34:10 -0300
用于搜索文件并添加到数组
的代码regex="^[a-zA-Z0-9!#\$%&'\*\+/=?^_\`{|}~-]+(\.[a-zA-Z0-9!#\$%&'\*\+/=?^_\`{|}~-]+)*@([a-zA-Z0-9]([a-zA-Z0-9-]*[a-zA-Z0-9])?\.)+[a-zA-Z0-9]([a-zA-Z0-9-]*[a-zA-Z0-9])?\$"
for i in $(cat mailbox.mbx); do
if [[ $i =~ $regex ]]; then
echo $i
sortarray[$index]=$i
index=$(($index+1))
fi
done
答案 0 :(得分:3)
bash正则表达式不理解perl-ish \s
。你必须使用posix-ish [[:space:]]
。你还应该在那里添加量词
我看到你在$regex
中有锚点:那些绊倒你了吗?
对于像这样的大规模正则表达式,我喜欢将它们逐个构建:
char='[[:alnum:]!#\$%&'\''\*\+/=?^_\`{|}~-]'
name_part="${char}+(\.${char}+)*"
domain="([[:alnum:]]([[:alnum:]-]*[[:alnum:]])?\.)+[[:alnum:]]([[:alnum:]-]*[[:alnum:]])?"
begin='(^|[[:space:]])'
end='($|[[:space:]])'
# include capturing parentheses,
# these are the ** 2nd ** set of parentheses (there's a pair in $begin)
re_email="${begin}(${name_part}@${domain})${end}"
line="To: joe.smith@example.com"
[[ $line =~ $re_email ]] && echo ${BASH_REMATCH[2]}
# prints: joe.smith@example.com
当然,电子邮件地址非常复杂 - http://www.w3.org/Protocols/rfc822/#z8 - 并且几乎可以在任何地方允许使用评论和空白。实际上,(hi there) "My First Name".lastname (another comment) @ domain.(really)invalid
应被视为有效地址。有一个Perl模块Email::Address可以生成这个正则表达式:
$ perl -MEmail::Address -E 'say $Email::Address::addr_spec'
(?-xism:(?-xism:(?-xism:(?-xism:(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|))*\s*\)\s*)))*\s*\)\s*)|\s+)*(?-xism:[^\x00-\x1F\x7F()<>\[\]:;@\\,."\s]+(?:\.[^\x00-\x1F\x7F()<>\[\]:;@\\,."\s]+)*)(?-xism:(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|))*\s*\)\s*)))*\s*\)\s*)|\s+)*)|(?-xism:(?-xism:(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|))*\s*\)\s*)))*\s*\)\s*)|\s+)*"(?-xism:(?-xism:[^\\"])|(?-xism:\\(?-xism:[^\x0A\x0D])))+"(?-xism:(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|))*\s*\)\s*)))*\s*\)\s*)|\s+)*))\@(?-xism:(?-xism:(?-xism:(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|))*\s*\)\s*)))*\s*\)\s*)|\s+)*(?-xism:[^\x00-\x1F\x7F()<>\[\]:;@\\,."\s]+(?:\.[^\x00-\x1F\x7F()<>\[\]:;@\\,."\s]+)*)(?-xism:(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|))*\s*\)\s*)))*\s*\)\s*)|\s+)*)|(?-xism:(?-xism:(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|))*\s*\)\s*)))*\s*\)\s*)|\s+)*\[(?:\s*(?-xism:(?-xism:[^\[\]\\])|(?-xism:\\(?-xism:[^\x0A\x0D]))))*\s*\](?-xism:(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|))*\s*\)\s*)))*\s*\)\s*)|\s+)*)))
答案 1 :(得分:1)
此正则表达式应匹配所需的字符串:
"^To: (.+@.+)$"
电子邮件存储在$1