在C#列表中查找所有重复项

时间:2019-04-17 11:20:07

标签: c# .net linq

我下面显示了一个自定义类

internal class RecurringClusterModel
    {        
        public int? From { get; set; }       
        public int? To { get; set; }
        public string REC_Cluster_1 { get; set; }
        public string REC_Cluster_2 { get; set; }
        public string REC_Cluster_3 { get; set; }
        public string REC_Cluster_4 { get; set; }    
        public string REC_Cluster_5 { get; set; }
        public string REC_Cluster_6 { get; set; }
        public string REC_Cluster_7 { get; set; }
        public string REC_Cluster_8 { get; set; }
        public string REC_Cluster_9 { get; set; }
        public string REC_Cluster_10 { get; set; }

我有这个班级的名单

  List<RecurringClusterModel> recurringRecords = new List<RecurringClusterModel>(); 

The data can be in the below format
recurringRecords[0].REC_Cluster_1 = "USA";
recurringRecords[0].REC_Cluster_2 = "UK";
recurringRecords[0].REC_Cluster_3 = "India";
recurringRecords[0].REC_Cluster_4 = "France";
recurringRecords[0].REC_Cluster_5 = "China";


recurringRecords[1].REC_Cluster_1 = "France";
recurringRecords[1].REC_Cluster_2 = "Germany";
recurringRecords[1].REC_Cluster_3 = "Canada";
recurringRecords[1].REC_Cluster_4 = "Russia";
recurringRecords[1].REC_Cluster_5 = "India";

....

我想在所有“群集”属性之间找到重复的记录。This is just a subset I have 50 properties till REC_Cluster_50。我想找出在列表的50个群集属性之间哪些国家正在重复。

因此,在这种情况下,印度和法国正在重复。我可以按单个属性进行分组,然后通过获取计数来找出重复项,但是我必须对所有50个Rec_Clusters属性都执行此操作。不确定是否有更好的方法。

谢谢

2 个答案:

答案 0 :(得分:2)

由于您要捕获“从”和“到”,因此建议您像这样构造类:

True

然后您可以搜索重复项:

internal class RecurringClusterModel
{        
    public int? From { get; set; }       
    public int? To { get; set; }
    public IEnumerable<string> REC_Clusters { get; set; }
}

编辑

如果您想要所有重复项,那么它将是:

var dupes = recs
.Select(r => new
{
    r.From,
    r.To,
    DuplicateClusters = r.REC_Clusters.GroupBy(c => c)
          .Where(g => g.Count() > 1) // duplicates
          .SelectMany(g => g)  // flatten it back
          .ToArray() // indexed
})
.Where(r => r.DuplicateClusters.Any()) //only interested in clusters with duplicates
.ToArray();

但是现在您无法了解发件人/收件人

答案 1 :(得分:1)

我会在您的类中添加一个可枚举的类,以对该类的所有属性进行迭代:

def batch_gen(num):

    os.chdir('mydirectory/train')

    for n in num:
        placeholder = np.load('batch#' + str(n) + '.npy')
        X = placeholder[:,:513,:]
        Y1= placeholder[:,513:,:]

        Y = X * Y1

        X = X / normalization # normalize X
        X = scale_mag*X.astype(np.float32)

        Y = Y / normalization 
        Y = scale_mag*Y.astype(np.float32)


        X = np.reshape(X,(32,513,30,1))
        Y = np.reshape(Y,(32,513,30,1))
        yield (X, Y)

my_gen = batch_gen(C)   # C is an array with indexes 1 to 4810 (looped by number of training epochs)

这样,您可以将列表展平到各个群集,然后分组。如果再次需要原始对象,则必须在展平时提供它。这是一个示例:

internal class RecurringClusterModel
{
    public string REC_Cluster_1 { get; set; }
    public string REC_Cluster_2 { get; set; }
    public string REC_Cluster_3 { get; set; }

    public IEnumerable<string> Clusters => GetAllClusters();

    private IEnumerable<string> GetAllClusters()
    {
        if (!string.IsNullOrEmpty(REC_Cluster_1))
            yield return REC_Cluster_1;

        if (!string.IsNullOrEmpty(REC_Cluster_2))
            yield return REC_Cluster_2;

        if (!string.IsNullOrEmpty(REC_Cluster_3))
            yield return REC_Cluster_3;
    }
}