我以下面的格式获取邮件存档,我的目标是解析它们并将它们存储在数据库中。我在下面的示例中使用多个样本来演示数据。唯一要注意的是" 来自"线
From: FirstName LastName <FirstName.MiddleName.LastName@someemail.com> In-Reply-To: <fc7b93ca4dab.531f4e68@my.bcit.ca> ------------------------------------------------- From: "FirstName. MiddleName =?iso-8859-1?b?TWFydO1uZXo=?= LastName" <somemeail@something.otherthing.es> Subject: Re: Some Randome Data In-Reply-To: <42043F8EC804DB48A3C4AF477195328F272CB9@exchange.something.local> ------------------------------------------------- From: "FirstName MiddleName LastName" <LastName@someemail.com> Subject: Some Randome Subject ------------------------------------------------- From: "FirstName. MiddleName =?iso-8859-1?b?TWFydO1uZXo=?= LastName" <somemeail@something.otherthing.es > Subject: Re: Some Randome Data In-Reply-To: <42043F8EC804DB48A3C4AF477195328F272CB9@exchange.something.local> ------------------------------------------------- From: "FirstName. MiddleName =?iso-8859-1?b?TWFydO1uZXo=?= LastName" < somemeail@something.otherthing.es > Subject: Re: Some Randome Data In-Reply-To: <42043F8EC804DB48A3C4AF477195328F272CB9@exchange.something.local>
到目前为止,我注意到所有标题除了&#34; 来自&#34;总是一致的,但它们总是出现在同一条线上,而且#34; 来自&#34;给了我很多时间。
我在我的C#代码中使用以下正则表达式来提取&#34;来自&#34;。
match = Regex.Match(msg, @"(?<=From:)", RegexOptions.Multiline | RegexOptions.IgnoreCase | RegexOptions.IgnorePatternWhitespace);
我也试过以下表达,但它弄乱了其他记录。
match = Regex.Match(msg, @"(?<=From:).*.\s*.*\s*(>)", RegexOptions.Multiline | RegexOptions.IgnoreCase | RegexOptions.IgnorePatternWhitespace);
我想做下面的事情 - 抓住以From开头的行:但不要捕获它,即(?&lt; = From :) - 现在继续,直到你到达&#34;&gt;&#34;它必须包括像空格,换行符
这样的所有内容我正在努力想出这个表达方式。
我已经通过regex-that-matches-a-newline-n-in-c-sharp,c-sharp-regex-match-any-text-between-tags-including-new-lines,但无法在我的代码中实现它。
完整示例代码
class Program
{
static void Main(string[] args)
{
foreach (var demoText in TestData())
{
var match = Regex.Match(demoText, @"(?<=From:).*", RegexOptions.Multiline | RegexOptions.IgnoreCase | RegexOptions.IgnorePatternWhitespace);
if (match.Success)
{
string fromField = match.Value.Replace(System.Environment.NewLine, " ");
// Found From - extract the email address
match = Regex.Match(fromField, @"(?<=<)+[^<>]+(?=>)+", RegexOptions.Multiline | RegexOptions.IgnoreCase | RegexOptions.IgnorePatternWhitespace);
Console.WriteLine("Email Address:" + match.Value);
// Extract the name
match = Regex.Match(fromField, @".*(?=<)", RegexOptions.Multiline | RegexOptions.IgnoreCase | RegexOptions.IgnorePatternWhitespace);
Console.WriteLine("Name:" + match.Value);
}
else
{
Console.WriteLine("*** Match not found in data: " + demoText);
}
}
Console.WriteLine("All done, press any key to close.");
Console.ReadLine();
}
static IEnumerable<string> TestData()
{
return @"
From: FirstName LastName <FirstName.MiddleName.LastName@someemail.com>
In-Reply-To: <fc7b93ca4dab.531f4e68@my.bcit.ca>ñ
From: ""FirstName. MiddleName =?iso-8859-1?b?TWFydO1uZXo=?= LastName""
<somemeail@something.otherthing.es>
Subject: Re: Some Randome Data
In-Reply-To: <42043F8EC804DB48A3C4AF477195328F272CB9@exchange.something.local>ñ
From: ""FirstName MiddleName LastName"" <LastName@someemail.com>
Subject: Some Randome Subject ñ
From: ""FirstName. MiddleName =?iso-8859-1?b?TWFydO1uZXo=?= LastName""
<somemeail@something.otherthing.es
>
Subject: Re: Some Randome Data
In-Reply-To: <42043F8EC804DB48A3C4AF477195328F272CB9@exchange.something.local>ñ
From: ""FirstName. MiddleName =?iso-8859-1?b?TWFydO1uZXo=?= LastName""
<
somemeail@something.otherthing.es
>
Subject: Re: Some Randome Data
In-Reply-To: <42043F8EC804DB48A3C4AF477195328F272CB9@exchange.something.local>
".Split('ñ').Select(item => item.Trim());
答案 0 :(得分:3)
答案 1 :(得分:2)
假设名称部分不能包含任何尖括号,您可以使用:
(?<=\bFrom:)[^>]+>
注意:如果需要,除了不区分大小写的选项外,您不需要特定的选项才能使其正常工作。
如果您想要一次性提取姓名和电子邮件,可以使用:
\bFrom:\s*(?:"(?<name>[^"]+)"|(?<name>[^<]+?))\s+<\s*(?<email>[^>]+?)\s*>