计算组合在Dataframe列中发生的频率 - Apriori算法

时间:2017-10-14 11:33:25

标签: python dataframe count frequency apriori

我在搜索组合频率的正确解决方案时遇到问题。

这是我的代码:

import pandas as pd
import itertools

list = [1,20,1,50]

combinations = []
for i in itertools.combinations(list ,2):
    combinations .append(i)

data = pd.DataFrame({'products':combinations})

data['frequency'] = data.groupby('products')['products'].transform('count')

print data

The out is:

    products  frequency
0   (1, 20)     1
1    (1, 1)     1
2   (1, 50)     2
3   (20, 1)     1
4  (20, 50)     1
5   (1, 50)     2

问题是(1,20)和(20,1),频率放1但是组合相同,必须是2,有没有正确解法的方法?

1 个答案:

答案 0 :(得分:0)

您可以使用applyand lambda

对列进行修改来使用group
import pandas as pd
import itertools

list = [1,20,1,50]

combinations = []
for i in itertools.combinations(list ,2):
    combinations .append(i)

data = pd.DataFrame({'products':combinations})

data['frequency'] = data.groupby(data['products'].apply(
    lambda i :tuple(sorted(i))))['products'].transform('count')

print (data)

输出

     products  frequency
0   (1, 20)          2
1    (1, 1)          1
2   (1, 50)          2
3   (20, 1)          2
4  (20, 50)          1
5   (1, 50)          2