我下面显示了一个自定义类
internal class RecurringClusterModel
{
public int? From { get; set; }
public int? To { get; set; }
public string REC_Cluster_1 { get; set; }
public string REC_Cluster_2 { get; set; }
public string REC_Cluster_3 { get; set; }
public string REC_Cluster_4 { get; set; }
public string REC_Cluster_5 { get; set; }
public string REC_Cluster_6 { get; set; }
public string REC_Cluster_7 { get; set; }
public string REC_Cluster_8 { get; set; }
public string REC_Cluster_9 { get; set; }
public string REC_Cluster_10 { get; set; }
我有这个班级的名单
List<RecurringClusterModel> recurringRecords = new List<RecurringClusterModel>();
The data can be in the below format
recurringRecords[0].REC_Cluster_1 = "USA";
recurringRecords[0].REC_Cluster_2 = "UK";
recurringRecords[0].REC_Cluster_3 = "India";
recurringRecords[0].REC_Cluster_4 = "France";
recurringRecords[0].REC_Cluster_5 = "China";
recurringRecords[1].REC_Cluster_1 = "France";
recurringRecords[1].REC_Cluster_2 = "Germany";
recurringRecords[1].REC_Cluster_3 = "Canada";
recurringRecords[1].REC_Cluster_4 = "Russia";
recurringRecords[1].REC_Cluster_5 = "India";
....
我想在所有“群集”属性之间找到重复的记录。This is just a subset I have 50 properties till REC_Cluster_50
。我想找出在列表的50个群集属性之间哪些国家正在重复。
因此,在这种情况下,印度和法国正在重复。我可以按单个属性进行分组,然后通过获取计数来找出重复项,但是我必须对所有50个Rec_Clusters属性都执行此操作。不确定是否有更好的方法。
谢谢
答案 0 :(得分:2)
由于您要捕获“从”和“到”,因此建议您像这样构造类:
True
然后您可以搜索重复项:
internal class RecurringClusterModel
{
public int? From { get; set; }
public int? To { get; set; }
public IEnumerable<string> REC_Clusters { get; set; }
}
编辑
如果您想要所有重复项,那么它将是:
var dupes = recs
.Select(r => new
{
r.From,
r.To,
DuplicateClusters = r.REC_Clusters.GroupBy(c => c)
.Where(g => g.Count() > 1) // duplicates
.SelectMany(g => g) // flatten it back
.ToArray() // indexed
})
.Where(r => r.DuplicateClusters.Any()) //only interested in clusters with duplicates
.ToArray();
但是现在您无法了解发件人/收件人
答案 1 :(得分:1)
我会在您的类中添加一个可枚举的类,以对该类的所有属性进行迭代:
def batch_gen(num):
os.chdir('mydirectory/train')
for n in num:
placeholder = np.load('batch#' + str(n) + '.npy')
X = placeholder[:,:513,:]
Y1= placeholder[:,513:,:]
Y = X * Y1
X = X / normalization # normalize X
X = scale_mag*X.astype(np.float32)
Y = Y / normalization
Y = scale_mag*Y.astype(np.float32)
X = np.reshape(X,(32,513,30,1))
Y = np.reshape(Y,(32,513,30,1))
yield (X, Y)
my_gen = batch_gen(C) # C is an array with indexes 1 to 4810 (looped by number of training epochs)
这样,您可以将列表展平到各个群集,然后分组。如果再次需要原始对象,则必须在展平时提供它。这是一个示例:
internal class RecurringClusterModel
{
public string REC_Cluster_1 { get; set; }
public string REC_Cluster_2 { get; set; }
public string REC_Cluster_3 { get; set; }
public IEnumerable<string> Clusters => GetAllClusters();
private IEnumerable<string> GetAllClusters()
{
if (!string.IsNullOrEmpty(REC_Cluster_1))
yield return REC_Cluster_1;
if (!string.IsNullOrEmpty(REC_Cluster_2))
yield return REC_Cluster_2;
if (!string.IsNullOrEmpty(REC_Cluster_3))
yield return REC_Cluster_3;
}
}