我有以下列表,其中包含6个条目:
lol = [['a', 3, 1.01],
['x', 5, 1.00],
['k', 7, 2.02],
['p', 8, 3.00],
['b', 10, 1.09],
['f', 12, 2.03]]
lol
中的每个子列表包含3个元素:
['a', 3, 1.01]
e1 e2 e3
上面的列表已根据e2
(即第二个元素)
我想要群集'以上列表大致遵循以下步骤:
lol
中的最低条目(wrt.e2)作为第一个群集的键最终结果如下所示,阈值<= 0.1。
dol = {'a':['a', 'x', 'b'],
'k':['k', 'f'],
'p':['p']}
我坚持这一点,这是正确的做法:
import json
from collections import defaultdict
thres = 0.1
tmp_e3 = 0
tmp_e1 = "-"
lol = [['a', 3, 1.01], ['x', 5, 1.00], ['k', 7, 2.02],
['p', 8, 3.00], ['b', 10, 1.09], ['f', 12, 2.03]]
dol = defaultdict(list)
for thelist in lol:
e1, e2, e3 = thelist
if tmp_e1 == "-":
tmp_e1 = e1
else:
diff = abs(tmp_e3 - e3)
if diff > thres:
tmp_e1 = e1
dol[tmp_e1].append(e1)
tmp_e1 = e1
tmp_e3 = e3
print json.dumps(dol, indent=4)
答案 0 :(得分:2)
我首先要确保lol在第二个元素上排序,然后迭代在列表中仅保留第一个元素不在阈值中的内容:
import json
thres = 0.1
tmp_e3 = 0
tmp_e1 = "-"
lol = [['a', 3, 1.01], ['x',5, 1.00],['k',7, 2.02],
['p',8, 3.00], ['b', 10, 1.09], ['f', 12, 2.03]]
# ensure lol is sorted
lol.sort(key = (lambda x: x[1]))
dol = {}
while len(lol) > 0:
x = lol.pop(0)
lol2 = []
dol[x[0]] = [ x[0] ]
for i in lol:
if abs(i[2] - x[2]) < thres:
dol[x[0]].append(i[0])
else:
lol2.append(i)
lol = lol2
print json.dumps(dol, indent=4)
结果:
{
"a": [
"a",
"x",
"b"
],
"p": [
"p"
],
"k": [
"k",
"f"
]
}
答案 1 :(得分:0)
暂时放弃e2 / e3,这是一个草稿。
第一个生成器按值对数据进行分组,但它确实需要按值排序数据。
然后使用示例,首先是raw,然后是按值重新排序的数据。
In [32]: def cluster(lol, threshold=0.1):
cl, start = None, None
for e1, e2, e3 in lol:
if cl and abs(start - e3) <= threshold:
cl.append(e1)
else:
if cl: yield cl
cl = [e1]
start = e3
if cl: yield cl
In [33]: list(cluster(lol))
Out[33]: [['a', 'x'], ['k'], ['p'], ['b'], ['f']]
In [34]: list(cluster(sorted(lol, key = lambda ar:ar[-1])))
Out[34]: [['x', 'a', 'b'], ['k', 'f'], ['p']]