在此表单下编写电子邮件地址时,从文本文件中提取电子邮件地址的权利是什么?
某人在某事上。某人在某事上或在某事上某人。某人在某事上。恶意xtension
是否可以使用正则表达式将这些地址转换为普通的电子邮件地址?
Thanx提前
答案 0 :(得分:0)
我使用Ruby,但它在Perl中是相同的
>> "someone.someone at something.domainextension".sub(/\bat\b/,"@").gsub(/\s+/,"")
=> "someone.someone@something.domainextension"
基本上只用“@”替换“at”并删除所有空格。
答案 1 :(得分:0)
我相信以下代码可以完成您的任务。但是,如果您的电子邮件地址被分成多行,它将无法正常工作,如果您只有“at something.com”,它也会给您一个误报。如果您可以发布,我可以使此代码更具体地处理您的情况来自数据集的一些示例数据。
如上面的评论中所述,这不会绝对找到在RFC下有效的每个电子邮件地址,但我认为它应该处理您的问题。
my @lines_from_file; #holds our test info
#load the test info
$lines_from_file[0] = 'this is some text. We like to type to someone at somthing.com but sometimes';
$lines_from_file[1] = 'they go by someone.someone at something.com just to confuse us and hey you never';
$lines_from_file[2] = 'know, maybe they use parens like (someone at something.com).';
$lines_from_file[3] = 'make sure we do not find someone at .com. or someone something.com or someone at somethingcom';
my @all_email_addresses; #holds all found email addresses
#foreach line in the file
foreach my $line (@lines_from_file){
while($line =~ /([0-9a-zA-Z.]+) #capture any number or letter or dot 1 or more times
\sat\s #" at "
([0-9a-zA-Z.]+ #capture any number or letter or dot 1 or more times
\. #dot
\w{2,4}) #com or net or us or tv or info etc.,
/xg){
#everytime the line matches an email save the email in email form
push @all_email_addresses, "$1\@$2" ;
}
}
print "@all_email_addresses";
答案 2 :(得分:0)
/^(?:(\w+)\.)?(\w+)\s+at\s+(\w+)\.(\w+)$/
这不会捕获所有电子邮件地址,只会捕获您提供的表单。