Question

所以基本上我有这个循环，其中 processedSentencesList 中的每个句子都被迭代并扫描列表 entityString 中存在的单词。每个句子中找到的每个entityString都会添加到 var valid_words 中。

但实体＆＃34;哈利波特＆＃34;和＃34;福特汽车＆＃34;由于＆＃39;句子没有添加.Split（）＆＃39;声明。

如何更改此代码，以便具有空格的现有实体不会分成两个单词？

    List <string> entityString = new List<string>();
        entityString.Add("Harry Potter"); //A name which i do not want to split
        entityString.Add("Ford Car"); //A name which i do not want to split
        entityString.Add("Broom");
        entityString.Add("Ronald");

        List <string> processedSentencesList = new List<string>();
        processedSentencesList.Add("Harry Potter is a wizard");
        processedSentencesList.Add("Ronald had a Broom and a Ford Car");


        foreach (string sentence in processedSentencesList)
          {

                var words = sentence.Split(" ".ToCharArray()); 
                   //But it splits the names as well
                var valid_words = words.Where(w => 
                   entityStrings.Any(en_li => en_li.Equals(w)));
                    //And therefore my names do not get added to the valid_words list
          }

打印时，输出我现在就收到：

扫帚

罗纳德

我希望输出

哈利波特

福特汽车

扫帚

罗纳德

基本上，中间有空格的实体（2个或更多单词）会分开，因此无法与现有实体匹配。我该如何解决这个问题？

Answer 1

使用以下内容更改foreach：

List<String> valid_words = new List<String>();

foreach (string sentence in processedSentencesList)
{
    valid_words.AddRange(entityString.Where(en_li => sentence.Contains(en_li)));
}

valid_words = valid_words.Distinct().ToList();

Answer 2

你可以尝试匹配而不是拆分。

[A-Z]\S+(?:\s+[A-Z]\S+)?

DEMO

Answer 3

您可以遍历每个项目并使用'String.Contains（）'方法，这将阻止您分割搜索字符串。

示例：

List<string> valid_words = new List<string>();

foreach (string sentence in processedSentencesList)
{
  foreach (string entity in entityString)
  {
    if (sentence.Contains(entity))
    {
      valid_words.Add(entity);
    }
  }
}

C＃如何避免在.Split（）中拆分名称？

3 个答案: