Bash正则表达式电子邮件匹配

时间:2013-01-05 10:26:56

标签: regex bash email

我正在尝试使用bash中的正则表达式匹配一些电子邮件地址。 目前得到了表达

"^[a-zA-Z0-9!#\$%&'\*\+/=?^_\`{|}~-]+(\.[a-zA-Z0-9!#\$%&'\*\+/=?^_\`{|}~-]+)*@([a-zA-Z0-9]([a-zA-Z0-9-]*[a-zA-Z0-9])?\.)+[a-zA-Z0-9]([a-zA-Z0-9-]*[a-zA-Z0-9])?\$"

哪个成功匹配我需要的所有电子邮件,但是当尝试添加“收件人:”字段时,我似乎无法获得任何匹配,我不知道为什么。 这是我的代码,带有To字段。

"^To:\s[a-zA-Z0-9!#\$%&'\*\+/=?^_\`{|}~-]+(\.[a-zA-Z0-9!#\$%&'\*\+/=?^_\`{|}~-]+)*@([a-zA-Z0-9]([a-zA-Z0-9-]*[a-zA-Z0-9])?\.)+[a-zA-Z0-9]([a-zA-Z0-9-]*[a-zA-Z0-9])?\$"

哪个AFAIK应与“To:bob@bob.co.uk”匹配,但不是:( 有什么建议吗?

代码示例

Reply-To: "service@paypal.com" <service@paypal.com>
To: bob@bob.co.uk
Date: Mon, 21 Jun 2012 21:34:10 -0300

用于搜索文件并添加到数组

的代码
regex="^[a-zA-Z0-9!#\$%&'\*\+/=?^_\`{|}~-]+(\.[a-zA-Z0-9!#\$%&'\*\+/=?^_\`{|}~-]+)*@([a-zA-Z0-9]([a-zA-Z0-9-]*[a-zA-Z0-9])?\.)+[a-zA-Z0-9]([a-zA-Z0-9-]*[a-zA-Z0-9])?\$"


for i in $(cat mailbox.mbx); do 
    if [[ $i =~ $regex ]]; then
    echo $i
    sortarray[$index]=$i
    index=$(($index+1))
    fi
done

2 个答案:

答案 0 :(得分:3)

bash正则表达式不理解perl-ish \s。你必须使用posix-ish [[:space:]]。你还应该在那里添加量词

我看到你在$regex中有锚点:那些绊倒你了吗?

对于像这样的大规模正则表达式,我喜欢将它们逐个构建:

char='[[:alnum:]!#\$%&'\''\*\+/=?^_\`{|}~-]'
name_part="${char}+(\.${char}+)*"
domain="([[:alnum:]]([[:alnum:]-]*[[:alnum:]])?\.)+[[:alnum:]]([[:alnum:]-]*[[:alnum:]])?"
begin='(^|[[:space:]])'
end='($|[[:space:]])'

# include capturing parentheses, 
# these are the ** 2nd ** set of parentheses (there's a pair in $begin)
re_email="${begin}(${name_part}@${domain})${end}"

line="To: joe.smith@example.com"

[[ $line =~ $re_email ]] && echo ${BASH_REMATCH[2]}
# prints: joe.smith@example.com

当然,电子邮件地址非常复杂 - http://www.w3.org/Protocols/rfc822/#z8 - 并且几乎可以在任何地方允许使用评论和空白。实际上,(hi there) "My First Name".lastname (another comment) @ domain.(really)invalid应被视为有效地址。有一个Perl模块Email::Address可以生成这个正则表达式:

$ perl -MEmail::Address -E 'say $Email::Address::addr_spec'  
(?-xism:(?-xism:(?-xism:(?-xism:(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|))*\s*\)\s*)))*\s*\)\s*)|\s+)*(?-xism:[^\x00-\x1F\x7F()<>\[\]:;@\\,."\s]+(?:\.[^\x00-\x1F\x7F()<>\[\]:;@\\,."\s]+)*)(?-xism:(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|))*\s*\)\s*)))*\s*\)\s*)|\s+)*)|(?-xism:(?-xism:(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|))*\s*\)\s*)))*\s*\)\s*)|\s+)*"(?-xism:(?-xism:[^\\"])|(?-xism:\\(?-xism:[^\x0A\x0D])))+"(?-xism:(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|))*\s*\)\s*)))*\s*\)\s*)|\s+)*))\@(?-xism:(?-xism:(?-xism:(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|))*\s*\)\s*)))*\s*\)\s*)|\s+)*(?-xism:[^\x00-\x1F\x7F()<>\[\]:;@\\,."\s]+(?:\.[^\x00-\x1F\x7F()<>\[\]:;@\\,."\s]+)*)(?-xism:(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|))*\s*\)\s*)))*\s*\)\s*)|\s+)*)|(?-xism:(?-xism:(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|))*\s*\)\s*)))*\s*\)\s*)|\s+)*\[(?:\s*(?-xism:(?-xism:[^\[\]\\])|(?-xism:\\(?-xism:[^\x0A\x0D]))))*\s*\](?-xism:(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|))*\s*\)\s*)))*\s*\)\s*)|\s+)*)))

答案 1 :(得分:1)

此正则表达式应匹配所需的字符串:

"^To: (.+@.+)$"

电子邮件存储在$1