我有以下列表:
list= [(12.947999999999979,5804),(100000.0,1516),(12.948000000000008,844),(12.948000000000036,172),(18.252000000000066,92)]
元组中的第一个元素表示值,而元组的第二个元素表示此值在文档中出现的频率。我的问题是我如何聚类列表的相似元素(例如列表的第一个元素和第三个元素)并组合它们的频率?
答案 0 :(得分:2)
使用Counter
元素和round()
表示所需的小数位数。顺便说一下,不要使用保留字list
:
from collections import Counter
l= [(12.947999999999979,5804),(100000.0,1516),(12.948000000000008,844),(12.948000000000036,172),(18.252000000000066,92)]
precision = 3
c = Counter()
for value, times in l:
c.update([round(value, precision)]*times)
如果您已在计数器中拥有数据,则可以直接执行此操作:
from collections import Counter
# data = Counter() # This is the counter where you have the data
precision = 3
joined = Counter()
for value, times in data.items():
joined.update([round(value, precision)]*times)
data = joined