Question

使用.NET。

要搜索的示例字符串：

For more information on foreclosures visit <a href="http://www.us.gov/foreclosures.aspx">forclosures</a>

需要正则表达式来查找（并随后替换）此字符串中的止赎一词......但仅限于锚标记之外的实例。因此，在此示例中，只应匹配“止赎”一词的第一个实例。应完全忽略锚标记内的任何内容。

我到目前为止的正则表达式（它还没有正确地排除内部文本）是：

\bforeclosures(?!([^<]+)?>)

更新：提供第一回复后...... 我正在使用VB.NET，但我也能熟练使用C＃。

Answer 1

有时候做更多的事情比做一件事更好。您可以首先通过对Regex.Replace进行非常简单的调用来删除所有html标记，而不是花费数小时来制作完美的正则表达式。

// Remove html tags
var htmlTagPattern = new Regex(@"<([A-Z][A-Z0-9]*)\b[^>]*>", RegexOptions.IgnoreCase);
var noTags = htmlTagPattern.Replace(input, string.Empty);

// Find those words
var foreclosuresPattern = new Regex(@"foreclosures", RegexOptions.IgnoreCase);
var matches = foreclosuresPattern.Matches(noTags);

编辑：原帖只提到发现字样。替换字词的需求会增加一些复杂性。

// Try and find all cases
private string findAndTag(string input) {
    var pattern = new Regex(@"(\x3c[A-Z][A-Z0-9]*[^\x3e]*)?(foreclosures)([^\x3c\x3e]*>)?", RegexOptions.IgnoreCase);
    return pattern.Replace(matches, replacer);
}

private string replacer(Match match) {
    if (match.Groups[1].Success) {
        // Found the word foreclosures inside a tag, for
        // example <a href="foreclosures">...
        // Just return the original match - don't replace
        return match.Value;
    }
    else {
        // Found the word outside of tags
        // Tag it and return it
        return "<span>" + match.Value + "</span>";
    }
}

正则表达式：在Anchor标签中找到忽略innertext的Word

1 个答案: