从Python列表中简单聚类

时间:2014-11-17 04:41:18

标签: python algorithm

我有以下列表,其中包含5个条目:

my_lol = [['a', 1.01], ['x',1.00],['k',1.02],['p',3.00], ['b', 3.09]]

我想大致按照以下列出“聚类”上面的列表:

1. Sort `my_lol` with respect to the value in the list ascending
2. Pick the lowest entry in `my_lol` as the key of first cluster
3. Calculate the value difference of the current entry with the previous one
4. If the difference is less than the threshold, include that as the member cluster of the first
entry, otherwise assign the current key as the key of the next cluster. 
5. Repeat the rest until finish

在一天结束时,我想获得以下列表词典:

dol = {'x':['x','a','k'], 'p':['p','b']}

基本上,列表字典是一个包含两个集群的集群。

我尝试了这个但是从第3步中被卡住了。这是正确的方法吗?

import operator
import json
from collections import defaultdict

my_lol = [['a', 1.01], ['x',1.00],['k',1.02],['p',3.00], ['b', 3.09]]
my_lol_sorted = sorted(my_lol, key=operator.itemgetter(1))

thres = 0.1
tmp_val = 0
tmp_ids = "-"

dol = defaultdict(list)
for ids, val in my_lol_sorted:
    if tmp_ids != "-":
        diff = abs(tmp_val - val)

        if diff < thres:
            print tmp_ids
            dol[tmp_ids].append(tmp_ids)

    tmp_ids = ids
    tmp_val = val

print json.dumps(dol, indent=4)

2 个答案:

答案 0 :(得分:1)

试试这个:

dol = defaultdict(list)
if len(my_lol) > 0:
    thres = 0.1
    tmp_ids, tmp_val = my_lol_sorted[0]

    for ids, val in my_lol_sorted:
        diff = abs(tmp_val - val)
        if diff > thres:
            tmp_ids = ids
        dol[tmp_ids].append(ids)
        tmp_val = val

答案 1 :(得分:1)

import operator
import json
from collections import defaultdict

my_lol = [['a', 1.01], ['x',1.00],['k',1.02],['p',3.00], ['b', 3.09]]
my_lol_sorted = sorted(my_lol, key=operator.itemgetter(1))

thres = 0.1
tmp_val = 0
tmp_ids = "-"

dol = defaultdict(list)
for ids, val in my_lol_sorted:
    if tmp_ids == "-":
        tmp_ids = ids
    else:
        diff = abs(tmp_val - val)
        if diff > thres:
            tmp_ids = ids
    dol[tmp_ids].append(ids)
    tmp_val = val

print json.dumps(dol, indent=4)