如何在字符串列表中获得最重复的单词?

时间:2016-08-06 13:31:28

标签: c#

我有一个字符串列表。我想提取列表中重复次数最多的单词。

例如:

List<string> mylist=new List<string>();
mylist.Add("book is good ");
mylist.Add("i like flowers ");
mylist.Add("i reading book");

我想要提取图书而不是

@ user3185569回复以下代码

List<string> mylist = new List<string>();
mylist.Add("book is good ");
mylist.Add("i like flowers ");
mylist.Add("i reading book");

var mostRepeatedWord = mylist.SelectMany(x => x.Split(new [] { " " }, 
                                         StringSplitOptions.RemoveEmptyEntries))
                         .GroupBy(x => x).OrderByDescending(x => x.Count())
                         .Select(x => x.Key).FirstOrDefault();

但是这段代码提取了一个含有像,in等等词的单词

我想从我的列表中提取五个有意义的单词。我试图解决它,所以我在我的项目中添加了一个XML字典,其中包含 等字样。 并填写此词典的列表如下:

static List<string> notWord = new List<string>();
    public static void fillList()
    {
        XmlDocument doc = new XmlDocument();
        doc.Load(@"XMLDic.xml");
        foreach (XmlNode node in doc.DocumentElement.ChildNodes)
        {
            notWord.Add(node.InnerText); //or loop through its children as well
        }

    }

首先,我从列表中删除了这些单词,之后,在五个循环中,提取mostRepeatedWord并将其保存在新列表中。我从列表中删除mostRepeatedWord,此过程再次重复5次。

    public static List<string> finde(List<string> list)
    {
        List<string> newlist = new List<string>();

        fillList();
        delStr(list, "", true);
        for (int i = 0; i < 6; i++)
        {
            var mostRepeatedWord = list.SelectMany(x => x.Split(new[] { " " },
                                         StringSplitOptions.RemoveEmptyEntries))
                         .GroupBy(x => x).OrderByDescending(x => x.Count())
                         .Select(x => x.Key).FirstOrDefault();

            if (mostRepeatedWord!="")
                newlist.Add(mostRepeatedWord);
            delStr(list, mostRepeatedWord, false);
        }
        return newlist;
    }

删除list方法的单词是:

   public static List<string> delStr(List<string> list, string str, bool t)
    {
        if (t)
        {
            string s;
            for (int i = 0; i < list.Count; i++)
            {
                s = list[i];
                foreach (var i1 in notWord)
                {
                    s = s.Replace(i1, "");
                }

                list[i] = s;
            }
        }
        else
        {
            string s;
            for (int i = 0; i < list.Count; i++)
            {
                s = list[i];

                s = s.Replace(str, "");


                list[i] = s;
            }
        }
        return list;

    }

我想知道它是否正确或者,有更好的方法吗?

1 个答案:

答案 0 :(得分:3)

使用Linq:

List<string> mylist = new List<string>();
mylist.Add("book is good ");
mylist.Add("i like flowers ");
mylist.Add("i reading book");

var mostRepeatedWord = mylist.SelectMany(x => x.Split(new [] { " " }, 
                                             StringSplitOptions.RemoveEmptyEntries))
                             .GroupBy(x => x).OrderByDescending(x => x.Count())
                             .Select(x => x.Key).FirstOrDefault();
  • 按空格分割:使用String.Split

  • 将其展平为一个单词列表:使用SelectMany

  • 按单词分组:使用GroupBy
  • 按事件排序:使用OrderByDescendingCount
  • 获取第一个元素:使用FirstOrDefault