我有以下列表,其中包含5个条目:
my_lol = [['a', 1.01], ['x',1.00],['k',1.02],['p',3.00], ['b', 3.09]]
我想大致按照以下列出“聚类”上面的列表:
1. Sort `my_lol` with respect to the value in the list ascending
2. Pick the lowest entry in `my_lol` as the key of first cluster
3. Calculate the value difference of the current entry with the previous one
4. If the difference is less than the threshold, include that as the member cluster of the first
entry, otherwise assign the current key as the key of the next cluster.
5. Repeat the rest until finish
在一天结束时,我想获得以下列表词典:
dol = {'x':['x','a','k'], 'p':['p','b']}
基本上,列表字典是一个包含两个集群的集群。
我尝试了这个但是从第3步中被卡住了。这是正确的方法吗?
import operator
import json
from collections import defaultdict
my_lol = [['a', 1.01], ['x',1.00],['k',1.02],['p',3.00], ['b', 3.09]]
my_lol_sorted = sorted(my_lol, key=operator.itemgetter(1))
thres = 0.1
tmp_val = 0
tmp_ids = "-"
dol = defaultdict(list)
for ids, val in my_lol_sorted:
if tmp_ids != "-":
diff = abs(tmp_val - val)
if diff < thres:
print tmp_ids
dol[tmp_ids].append(tmp_ids)
tmp_ids = ids
tmp_val = val
print json.dumps(dol, indent=4)
答案 0 :(得分:1)
试试这个:
dol = defaultdict(list)
if len(my_lol) > 0:
thres = 0.1
tmp_ids, tmp_val = my_lol_sorted[0]
for ids, val in my_lol_sorted:
diff = abs(tmp_val - val)
if diff > thres:
tmp_ids = ids
dol[tmp_ids].append(ids)
tmp_val = val
答案 1 :(得分:1)
import operator
import json
from collections import defaultdict
my_lol = [['a', 1.01], ['x',1.00],['k',1.02],['p',3.00], ['b', 3.09]]
my_lol_sorted = sorted(my_lol, key=operator.itemgetter(1))
thres = 0.1
tmp_val = 0
tmp_ids = "-"
dol = defaultdict(list)
for ids, val in my_lol_sorted:
if tmp_ids == "-":
tmp_ids = ids
else:
diff = abs(tmp_val - val)
if diff > thres:
tmp_ids = ids
dol[tmp_ids].append(ids)
tmp_val = val
print json.dumps(dol, indent=4)