因此,我试图计算第一列csv文件中每个项目的出现次数。但是结果不正确:我有这样的输出:OrderedDict([[''3178040678842',1),('4005808283804',1),('3337872414527',1), ..而每个数字在csv文件中显示为2或3次。
这是代码:
import csv
from collections import Counter, OrderedDict
#the purpose of this small script is checking if values are double in EAN
list result
eans_to_count = set()
with open("example.csv", "r") as new_data:
reader = csv.reader(new_data, delimiter=',', quotechar='"')
for row in reader:
if row:
ean = row[0]
eans_to_count.add(ean)
x = Counter(eans_to_count)
y = OrderedDict(x.most_common())
print(y)
你知道我错了吗?因为我确定结果不正确
答案 0 :(得分:2)
set
在计算重复值之前会丢弃它们。使用Counter
的正常方法是直接向其中添加 :
eans_to_count = Counter()
with open("example.csv", "r") as new_data:
reader = csv.reader(new_data, delimiter=',', quotechar='"')
for row in reader:
if row:
ean = row[0]
eans_to_count[ean] += 1