我最初是从52行格式为[名称,属性1,属性2]的csv文件开始的。我已经导入了csv文件,并为每行创建了所有可能的size组合,每个组合的大小为2,所以我有类似这样的列表:
([Bill, Long, Blonde], [Sally, Short, Blonde]),
([Bobby, Long, Brown], [James, Short, Orange])
等...
我希望能够比较属性1和属性2,然后最终对其加权,这样我就可以找到2个组中最共有的组。我正在努力寻找一种方法,可以轻松地比较属性1和2,而无需首先拆除组。
我到目前为止编写的代码如下:
import csv
from itertools import combinations
with open('dc.csv', 'r') as f:
csvreader = csv.reader(f)
comb = combinations(csv.reader(f), 2)
for i in list(comb):
print (i)
编辑: 我想要的输出是按照最佳匹配顺序与最小匹配集的顺序排列列表。像这样:
([James, Short, Orange], [Bridgett, Short, Orange], 2)
([Bill, Long, Blonde], [Sally, Short, Blonde], 1),
([Bobby, Long, Brown], [James, Short, Orange], 0),
那是因为James和Bridgett在头发颜色(1)和头发长度(1)上都匹配,所以它们的得分为2,依此类推。这样一来,我便可以按匹配程度最高和匹配程度最低的顺序对它们进行排序。
答案 0 :(得分:0)
据我了解,您想要的是计算每个元素的“相似度”。
这就是我所做的:
a = [['Bill', 'Long','Blonde'], ['Sally', 'Short', 'Blonde'], ['Bobby', 'Long', 'Brown'], ['James', 'Short', 'Orange']]
def likenessCalculator(groupA, indexA, groupB, indexB):
# Function that calculates how close the attributes are
likeness = 0
if groupA[1] == groupB[1]:
likeness += 1
if groupA[2] == groupB[2]:
likeness += 1
return (groupA, groupB, likeness)
results = []
for idx, element in enumerate(a):
for idx2, element2 in enumerate(a):
# Here I iterate through the array, and for every element, I compare it with each other element
# This code doesn't remove duplicates yet, but it shouldn't be hard to implement.
results.append(likenessCalculator(element, idx, element2, idx2))
print(results)
答案 1 :(得分:0)
据我了解,一旦您从csv文件中获取列表,就可以尝试执行以下操作:
the_list = [([Bill, Long, Blonde], [Sally, Short, Blonde]), ([Bobby, Long, Brown], [James, Short, Orange])]
result= []
for names in the_list:
n = len(set(names[0]).intersection(set(names[1])))
new = list(names)
new.append(n)
result.append(new)
print(result)
您将在result
变量中找到最终列表