从python中的序列列表中提取特定对象

时间:2018-02-27 14:06:07

标签: algorithm python-2.7 sequences fpm

我实现了fpm算法从活动数据中查找规则,我有格式的输出数据。

for itemset in find_frequent_itemsets(dataset, 0.1,include_support=True):
    print itemset

以下是上述代码的输出:

([u'Global Connect Village'], 28)
([u'Terminal 2', u'Global Connect Village'], 1)
([u'VivoCity', u'Global Connect Village'], 1)
([u'Universal Studios Singapore', u'VivoCity', u'Global Connect Village'], 1)
([u'Universal Studios Singapore', u'Global Connect Village'], 2)
([u'Orchard Gateway', u'Global Connect Village'], 2)
([u'Chinatown', u'Global Connect Village'], 2)
([u'Singapore Changi Airport (SIN)', u'Chinatown', u'Global Connect Village'], 2)
([u'Fragrance Hotel', u'Global Connect Village'], 2)
([u'Singapore Changi Airport (SIN)', u'Fragrance Hotel', u'Global Connect Village'], 1)
([u'Singapore', u'Global Connect Village'], 3)
([u'Singapore Changi Airport (SIN)', u'Singapore', u'Global Connect Village'], 1)
([u"McDonald's", u'Global Connect Village'], 4)
([u'Singapore Changi Airport (SIN)', u"McDonald's", u'Global Connect Village'], 1)

我想只提取那些具有更高支持且包含三个或更多对象的值。

1 个答案:

答案 0 :(得分:1)

只需使用filtersorted

formSelectElement.select2({
    placeholder: 'Type to search for a user...',
    minimumInputLength: 3,
    query: getCompanyUsers,
    multiple: true,
    maximumSelectionSize: formSelectElement.data('primary'),
    initSelection: setAssigneeInitSelection
});

然后你可以选择你想要的顶级元素:

MIN_LOCS = 3
itemset = find_frequent_itemsets(dataset, 0.1,include_support=True
itemset = sorted(filter(lambda it: len(it[0]) >= MIN_LOCS, itemset), key=lambda it: it[1])

如果要包含最小支持值,只需根据需要调整过滤:

itemset_top_5 = itemset[:5]