Question

我正试图在＆＃34;免责声明＆＃34;中提取一些信息。股票促销领域＆＃34; tout＆＃34;电子邮件（大多数垃圾邮件）。

通常情况下，兜售者会有免责声明：

公司XYZ已获得五万美元的赔偿，为期两周的ABC股票推广。

我有一个适用于这种情况的正则表达式（可能不是最有效的），它似乎适用于大多数情况。但是，当免责声明使用网址引用促销公司（即www.companyxyz.com而非公司XYZ）时，我的正则表达式会抓住＆＃34; .com＆＃34;以及我试图捕捉的其余部分 - 但不是＆＃34; www.companyxyz＆＃34;一部分。

这是我的正则表达式方法：

    public string ExtractCompensationLine(string message)
    {
        string compensationLine = string.Empty;
        string messageLine = Regex.Replace(message, "[\n\r\t]", " ");
        string leftPrefix = @"\.((\w|\s|\d|\,)+";
        string rightPrefix = @"(\w|\s|\d|\,)+\.)";

        string[] phrases = 
        {
            @"has been compensated",
            @"we were also paid",
            @"has been previously compensated",
            @"currently being compensated",
            @"the company has compensated",
            @"has agreed to be compensated",
            @"have been compensated up to",
            @"dollars from a third party",
            @"the company will compensate us"
        };

        foreach (string phrase in phrases)
        {
            string pattern = leftPrefix + phrase + rightPrefix;
            Regex compensationRegex = new Regex(pattern, RegexOptions.IgnoreCase);
            Match match = compensationRegex.Match(messageLine);

            if (match.Success)
            {
                compensationLine += match.Groups[1].Value;
            }
        }

        return compensationLine;
    }

因此，正则表达式从句子的第一个单词中捕获整个短语（通过查找前一个句子，直到句子的最后一个句子。但这些网址对我的正则表达式不好。< / p>

Answer 1

如果我正确理解你的问题，给定一个包含给定短语之一的句子，你想要从该句子的开头到结尾或行尾。您的挑战是找到您想要匹配的句子之前的句子的结尾。所以你需要匹配“。”（句点后跟空格。）然后其余的。

我不明白为什么你有“（\ w | \ s | \ d | \，）”而不只是“。”它不会给出我上面描述的结果，但我会保留原样，只关注你描述的问题。

所以试试这个：

leftPrefix = @"(\.*\s+)*?((\w|\d|\,)+";

（。* \ s +）*：匹配任何字符后跟一个句点后跟空格。

由于我使用parens对这个新的子表达式进行分组，你将拥有一个新的捕获组，这意味着你需要使用Match对象的Captures集合，而不是Value。

尝试使用.NET正则表达式从电子邮件中提取信息

1 个答案: