在多列中找到最常见的对

时间:2019-10-04 04:02:46

标签: python-3.x pandas

我有一个熊猫数据框,如下所示:

Client   1_act      2_act      3_act      4_act      5_act      6_act   ...
1        hiking     swimming   skating    jumping    climbing   eating
2        eating     hiking     climbing   exploring  
3        hiking     exercising 
4        hiking     screaming  yelling    hopping    swimming  
...

每行仅具有唯一的活动,并且可以具有许多都称为#_act的列(也可以在客户端报告新活动时随时添加新列)。每行至少有一对(每行至少有2个活动)。也可以随时添加新的活动值。

我正在尝试找到一种方法来返回最常见的活动对。因此所需的输出将是:

Pair                       Qty
(hiking, swimming)         2
(hiking, skating)          1
(hiking, jumping)          1
(hiking, climbing)         2
(hiking, eating)           2
(swimming, skating)        1
(swimming, jumping)        1
(swimming, climbing)       1
(swimming, eating)         1
(skating, jumping)         1
(skating, climbing)        1
(skating, eating)          1
(jumping, climbing)        1
(climbing, eating)         2
(eating, exploring)        1
(hiking, exercising)       1
(hiking, screaming)        1
(hiking, yelling)          1
(hiking, hopping)          1

...

上面的输出是此示例数据集中所有行中所有可能的对的示例输出。如果一对在随后的行中重复出现,则应该增加数量,如果在随后的行中出现新的一对,则应该将其作为新行添加到成对的列中。

这样做的目的是查看所有客户中最常见的一对活动。任何帮助将不胜感激!谢谢!!

1 个答案:

答案 0 :(得分:1)

在列表理解中使用combinations进行展平,按Counter计算元组,然后传递给DataFrame构造函数:

from collections import Counter
from  itertools import combinations

df = df.set_index('Client')

c = Counter([y for x in df.values for y in combinations(x, 2)])
df = pd.DataFrame({'Pair': list(c.keys()), 'Qty': list(c.values())})

对于顶级组合:

n = 10
L = Counter([y for x in df.values for y in combinations(x, 2)]).most_common(n)

df = pd.DataFrame(L, columns=['Pair', 'Qty'])
print (df)
                   Pair  Qty
0    (hiking, swimming)    2
1    (hiking, climbing)    2
2      (hiking, eating)    2
3    (swimming, eating)    2
4     (hiking, hopping)    2
5     (hiking, skating)    1
6     (hiking, jumping)    1
7   (swimming, skating)    1
8   (swimming, jumping)    1
9  (swimming, climbing)    1