我有这样的数据集:
ID1 ID2
11 22
11 34
22 35
35 9
41 10
52 87
9 65
34 43
我想要一个输出数据集,它使用ID1和ID2分配检测重复ID:
ID1 ID2 ID3
11 22 ID_11
11 34 ID_11
22 35 ID_11
35 9 ID_11
41 10 ID_10
52 87 ID_87
9 65 ID_11
34 43 ID_1
由于ID 11,22,35,9,34
都是彼此引用的,因此它们会映射到一个ID,即ID_11
答案 0 :(得分:0)
你没有提供太多信息来干净利地写这个,但是这段代码应该在改变一些细节之后为你提供解决问题所需的python表达式。
# your id list, as a list of lists
vars = [
[11, 22],
[11, 34],
[22, 35],
[35, 9],
[41, 10],
[52, 87],
[9, 65],
[34, 43]
]
# create disjoint sets
groups = []
for id_1, id_2 in vars:
for group in groups:
if id_1 in group or id_2 in group:
group.add(id_1)
group.add(id_2)
break
else:
groups.append({id_1, id_2})
# map the sets to some unique id/string/whatever
id_mappings = {}
for id_counter, group in enumerate(groups):
id_mappings[id_counter] = group
# add the unique id/string/whatever to the initial list
for id_pair in vars:
for group_id, group in id_mappings.items():
if id_pair[0] in group:
id_pair.append(group_id)
for var in vars:
print(var)
>> [11, 22, 0]
>> [11, 34, 0]
>> [22, 35, 0]
>> [35, 9, 0]
>> [41, 10, 1]
>> [52, 87, 2]
>> [9, 65, 0]
>> [34, 43, 0]