按匹配关键字的数量对列表/数组进行分组或排序

时间:2016-10-16 23:03:42

标签: c# arrays algorithm linq

在C#中,通过每个&之间的匹配关键字的数量对字符串数组或List的元素进行分组或排序是一种好的,有效的方法。每个元素。具有最匹配关键字的元素应放在彼此附近。

例如,如果集合是:

string[] movies = {
    "Star Wars Episode IV A New Hope",
    "Force of Hunger",
    "The Hunger Games Mockingjay",
    "Jaws 2",
    "The Shawshank Redemption",
    "Hunger Pain",
    "The Hunger Games",
    "Jaws: The Revenge",
    "The Hunger Games Catching Fire",
    "Rogue One A Star Wars Story",
    "Aqua Teen Hunger Force",
    "The Force Awakens Star Wars",
};

然后处理后的结果应该有点类似于:

{
    "The Hunger Games Mockingjay",
    "The Hunger Games Catching Fire",
    "The Hunger Games",

    "Aqua Teen Hunger Force",
    "Force of Hunger",

    "Rogue One A Star Wars Story",
    "The Force Awakens Star Wars"
    "Star Wars Episode IV A New Hope",

    "Jaws: The Revenge",
    "Jaws 2",

    "Hunger Pain",

    "The Shawshank Redemption",
};

2 个答案:

答案 0 :(得分:1)

   Group : 4
   The Force Awakens Star Wars
   Group : 3
   The Hunger Games Mockingjay
   The Hunger Games
   The Hunger Games Catching Fire
   Group : 2
   Star Wars Episode IV A New Hope
   Force of Hunger
   Jaws: The Revenge
   Rogue One A Star Wars Story
   Aqua Teen Hunger Force
   Group : 1
   Jaws 2
   The Shawshank Redemption
   Hunger Pain

结果将是

Imports System.Data.OleDb
Public Class DBControl
Private DBCon As New OleDbConnection("Provider=Microsoft.ACE.OLEDB12.0;Data Source=StableMe.accdb;")
Private DBCmd As OleDbCommand

Public DBDA As OleDbDataAdapter
Public DBDT As DataTable
Public params As New List(Of OleDbParameter)
Public recordCt As Integer
Public exception As String

Public Sub ExecQuery(query As String)
    recordCt = 0
    exception = ""

    Try
        DBCon.Open()
        DBCmd = New OleDbCommand(query, DBCon)
        For Each p As OleDbParameter In params
            DBCmd.Parameters.Add(p)
        Next
        params.Clear()
        DBDT = New DataTable
        DBDA = New OleDbDataAdapter(DBCmd)
        recordCt = DBDA.Fill(DBDT)
    Catch ex As Exception
        exception = ex.Message
    End Try

    If DBCon.State = ConnectionState.Open Then
        DBCon.Close()
    End If
End Sub

Public Sub AddParams(name As String, value As Object)
    Dim newParam As New OleDbParameter(name, value)
    params.Add(newParam)
End Sub

答案 1 :(得分:1)

以下是我采取的方法:

  1. 将每个标题分解为一组标准化的单词,不包括“a”,“an”和“the”等“噪音”字样。
  2. 找出每对单词集的交叉点(共性)。
  3. 将每个标题添加到按标题键入的交叉点集的字典中。将每个交集添加到该标题的集合中。
  4. 最后,按照交叉点大小(最大的第一个)排序字典,然后按交叉点中的单词排序,最后按标题排序,以得到最终的标题列表。
  5. 以下是代码中的内容:

    using System;
    using System.Collections.Generic;
    using System.Linq;
    
    public class Program
    {
        public static void Main()
        {
            string[] movies = {
                "Star Wars Episode IV A New Hope",
                "Force of Hunger",
                "The Hunger Games Mockingjay",
                "Jaws 2",
                "The Shawshank Redemption",
                "Hunger Pain",
                "The Hunger Games",
                "Jaws: The Revenge",
                "The Hunger Games Catching Fire",
                "Rogue One A Star Wars Story",
                "Aqua Teen Hunger Force",
                "The Force Awakens Star Wars",
            };
    
            List<HashSet<string>> titleWords = movies
                .Select(m => new HashSet<string>(
                    m.Split(new char[] { ' ', ':' }, StringSplitOptions.RemoveEmptyEntries)
                    .Select(w => w.ToLower())
                    .Where(w => w != "a" && w != "an" && w != "the")))
                .ToList();
    
            var titles = new Dictionary<string, SortedSet<Commonality>>();
            for (int i = 0; i < titleWords.Count; i++)
            {
                for (int j = i + 1; j < titleWords.Count; j++)
                {
                    var wordsInCommon = titleWords[i]
                        .Intersect(titleWords[j])
                        .OrderBy(w => w)
                        .ToList();
                    Commonality c = new Commonality(wordsInCommon);
                    AddCommonalities(titles, movies[i], c);
                    AddCommonalities(titles, movies[j], c);
                }
            }
    
            string[] groupedTitles = titles
                .OrderBy(k => k.Value.First())
                .ThenBy(k => k.Key)
                .Select(k => k.Key)
                .ToArray();
    
            Console.WriteLine(string.Join("\r\n", groupedTitles));
        }
    
        private static void AddCommonalities(Dictionary<string, SortedSet<Commonality>> dict, string title, Commonality c)
        {
            SortedSet<Commonality> commonalities;
            if (!dict.TryGetValue(title, out commonalities))
            {
                commonalities = new SortedSet<Commonality>();
                dict.Add(title, commonalities);
            }
            commonalities.Add(c);
        }
    }
    
    class Commonality : IComparable<Commonality>
    {
        public string JoinedWords { get; private set; }
        public int WordCount { get; private set; }
    
        public Commonality(List<string> words)
        {
            JoinedWords = string.Join(" ", words);
            WordCount = words.Count;
        }
    
        public override bool Equals(object obj)
        {
            Commonality that = obj as Commonality;
            return (that != null && that.JoinedWords == JoinedWords);
        }
    
        public override int GetHashCode()
        {
            return JoinedWords.GetHashCode();
        }
    
        public int CompareTo(Commonality other)
        {
            int r = other.WordCount - WordCount;
            if (r == 0) return string.CompareOrdinal(JoinedWords, other.JoinedWords);
            return r;
        }
    
        public override string ToString()
        {
            return WordCount + " " + JoinedWords;
        }
    }
    

    输出:

    Aqua Teen Hunger Force
    Force of Hunger
    The Hunger Games
    The Hunger Games Catching Fire
    The Hunger Games Mockingjay
    Rogue One A Star Wars Story
    Star Wars Episode IV A New Hope
    The Force Awakens Star Wars
    Hunger Pain
    Jaws 2
    Jaws: The Revenge
    The Shawshank Redemption
    

    小提琴:https://dotnetfiddle.net/ksMMY6