Question

说我有这两个字符串： “有些文字在这里”和“有些文字在这里”

我有一个集合，其中包含我想要与字符串中的文本匹配的单词。 “一些”，“文字”，“这里”

如果其中一个单词与字符串中的某个单词匹配（无论是大写还是小写），我想从字符串中取出原始单词并在其周围添加一些HTML标记，如{{1 }}。

我正在使用string.Replace（）方法，但不知道如何使其匹配，无论如何仍然保持原始单词完整（以便我不用{{替换“word” 1}}或反之亦然。）

Answer 1

事实上，在这种情况下，string.Replace方法的通用性不足以满足您的要求。低级文本操作应该完成这项工作。替代方案当然是正则表达式，但我在这里提出的算法将是最有效的方法，我认为无论如何编写它将有助于了解如何在没有 >改革的正则表达式。

这是功能。

<强>更新

现在使用Dictionary<string, string>而不是string[]，这样可以将定义与单词一起传递给函数。
现在使用定义字典的任意排序。

...

public static string HtmlReplace(string value, Dictionary<string, string>
    definitions, Func<string, string, string> htmlWrapper)
{
    var sb = new StringBuilder(value.Length);

    int index = -1;
    int lastEndIndex = 0;
    KeyValuePair<string, string> def;
    while ((index = IndexOf(value, definitions, lastEndIndex,
        StringComparison.InvariantCultureIgnoreCase, out def)) != -1)
    {
        sb.Append(value.Substring(lastEndIndex, index - lastEndIndex));
        sb.Append(htmlWrapper(def.Key, def.Value));
        lastEndIndex = index + def.Key.Length;
    }
    sb.Append(value.Substring(lastEndIndex, value.Length - lastEndIndex));

    return sb.ToString();
}

private static int IndexOf(string text, Dictionary<string, string> values, int startIndex,
    StringComparison comparisonType, out KeyValuePair<string, string> foundEntry)
{
    var minEntry = default(KeyValuePair<string, string>);
    int minIndex = -1;
    int index;
    foreach (var entry in values)
    {
        if (((index = text.IndexOf(entry.Key, startIndex, comparisonType)) < minIndex
            && index != -1) || minIndex == -1)
        {
            minIndex = index;
            minEntry = entry;
        }
    }

    foundEntry = minEntry;
    return minIndex;
}

还有一个小测试程序。（为方便起见，请注意使用lambda表达式。）

static void Main(string[] args)
{
    var str = "Definition foo; Definition bar; Definition baz";
    var definitions = new Dictionary<string, string>();
    definitions.Add("foo", "Definition 1");
    definitions.Add("bar", "Definition 2");
    definitions.Add("baz", "Definition 3");
    var output = HtmlReplace(str, definitions,
        (word, definition) => string.Format("<dfn title=\"{1}\">{0}</dfn>", 
            word, definition));
}

输出文字：

定义＆lt; dfn title =“定义1”＆gt; foo＆lt; / dfn＆gt ;;定义＆lt; dfn title =“定义2”＆gt; bar＆lt; / dfn＆gt ;;定义＆lt; dfn title =“定义3”＆gt; baz＆lt; / dfn＆gt;

希望有所帮助。

Answer 2

您可以使用正则表达式：

class Program {

    static string ReplaceWord(Match m) {
        return string.Format("<dfn>{0}</dfn>",m.Value);
    }

    static void Main(string[] args) {

        Regex r = new Regex("some|text|here", RegexOptions.IgnoreCase);
        string input = "Some random text.";
        string replaced = r.Replace(input, ReplaceWord);
        Console.WriteLine(replaced);
    }
}

RegexOptions.IgnoreCase用于匹配列表中的单词，无论其大小写如何 ReplaceWord函数返回由开始和结束标记包围的匹配字符串（正确引用）（请注意，您仍可能需要转义内部字符串）。

Answer 3

首先，我将是一个吝啬的并提供一个反对的答案：一个针对你的测试用例，这是一个代码反对的bugger。

如果我有条款会怎样？

Web Browser
Browser History

我反对这句话：

Now, clean the web browser history by ...

你得到了

Now, clean the <dfn title="Definition of word">web <dfn title="Definition of word">browser</dfn> history</dfn> by ...

我最近一直在努力解决同样的问题，但我认为我的解决方案不会对你有所帮助 - http://github.com/jarofgreen/TaggedWiki/blob/d002997444c35cafecd85316280a896484a06511/taggedwikitest/taggedwiki/views.py第47行。我最后在标签前面放了一个标记而没有包裹文本。

但是我可能会为你提供一部分答案：为了避免在HTML中捕获单词（如果你在上一段中标识了“title”标签会发生什么问题）我做了2次通过。在第一个搜索过程中，我存储了要包装的短语的位置，然后在我的第二个非搜索过程中，我放入了实际的HTML。这样，在您进行实际搜索时，文本中没有HTML。

Answer 4

可能是我错误地理解了你的问题。但为什么不使用正则表达式呢？

如果你的正则表达式正确，那么它们更快，更加傻瓜，并提供原始字符串的索引，它将为您提供匹配单词的确切位置，以便您可以在所需位置精确插入标记。

但是请注意，你必须使用带有匹配位置的String.Insert（）和字符串.replace（）将无济于事。

希望能回答你的问题。

Answer 5

最简单的方法是使用String.Replace，如你所说。

我很惊讶没有选择在String.Replace中指定StringComparisonOptions。

我为你写了一篇“不那么优化”但非常简单的IgnoreCaseReplace：

static string IgnoreCaseReplace(string text, string oldValue, string newValue)
{
    int index = 0;
    while ((index = text.IndexOf(oldValue,
        index,
        StringComparison.InvariantCultureIgnoreCase)) >= 0)
    {
        text = text.Substring(0, index)
            + newValue
            + text.Substring(index + oldValue.Length);

        index += newValue.Length;
    }

    return text;
}

为了使它更好，你可以将它包装在一个静态类中，并使其成为String的扩展方法：

static class MyStringUtilities
{
    public static string IgnoreCaseReplace(this string text, string oldValue, string newValue)
    {
        int index = 0;
        while ((index = text.IndexOf(oldValue,
            index,
            StringComparison.InvariantCultureIgnoreCase)) >= 0)
        {
            text = text.Substring(0, index)
                + newValue
                + text.Substring(index + oldValue.Length);

            index += newValue.Length;
        }

        return text;
    }
}

Answer 6

正则表达式代码：

/// <summary>
/// Converts the input string by formatting the words in the dict with their meanings
/// </summary>
/// <param name="input">Input string</param>
/// <param name="dict">Dictionary contains words as keys and meanings as values</param>
/// <returns>Formatted string</returns>
public static string FormatForDefns(string input, Dictionary<string,string> dict )
{
    string formatted = input;
    foreach (KeyValuePair<string, string> kv in dict)
    {
        string definition = "<dfn title=\"" + kv.Value + "\">" + kv.Key + "</dfn>.";
        string pattern = "(?<word>" + kv.Key + ")";
        formatted = Regex.Replace(formatted, pattern, definition, RegexOptions.IgnoreCase);
    }
    return formatted;
}

这是主叫代码

Dictionary<string, string> dict = new Dictionary<string, string>();
dict.Add("word", "meaning");
dict.Add("taciturn ", "Habitually silent; not inclined to talk");

string s = "word abase";
string formattedString = MyRegEx.FormatForDefns(s, dict);

在字符串中的某些单词周围注入HTML标记

6 个答案: