Question

我有这个数据框：

source target
0     ape    dog
1     ape   hous
2     dog   hous
3    hors    dog
4    hors    ape
5     dog    ape
6     ape   bird
7     ape   hous
8    bird   hous
9    bird   fist
10   bird    ape
11   fist    ape

我正在尝试使用以下代码生成频率计数：

df_count =df.groupby(['source', 'target']).size().reset_index().sort_values(0, ascending=False)
df_count.columns = ['source', 'target', 'weight']

我得到以下结果。

source target  weight
2     ape   hous       2
0     ape   bird       1
1     ape    dog       1
3    bird    ape       1
4    bird   fist       1
5    bird   hous       1
6     dog    ape       1
7     dog   hous       1
8    fist    ape       1
9    hors    ape       1
10   hors    dog       1

如何修改代码以使方向无关紧要，即代替ape bird 1和bird ape 1，我得到ape bird 2？

Answer 1

首先按行排序值。

In [31]: df
Out[31]:
   source target
0     ape    dog
1     ape   hous
2     dog   hous
3    hors    dog
4    hors    ape
5     dog    ape
6     ape   bird
7     ape   hous
8    bird   hous
9    bird   fist
10   bird    ape
11   fist    ape

In [32]: df.values.sort()

In [33]: df
Out[33]:
   source target
0     ape    dog
1     ape   hous
2     dog   hous
3     dog   hors
4     ape   hors
5     ape    dog
6     ape   bird
7     ape   hous
8    bird   hous
9    bird   fist
10    ape   bird
11    ape   fist

然后，groupby source, target，按大小汇总，sort结果。

In [34]: df.groupby(['source', 'target']).size().sort_values(ascending=False)
    ...:   .reset_index(name='weight')
Out[34]:
  source target  weight
0    ape   hous       2
1    ape    dog       2
2    ape   bird       2
3    dog   hous       1
4    dog   hors       1
5   bird   hous       1
6   bird   fist       1
7    ape   hors       1
8    ape   fist       1

Answer 2

您可以先按apply按行排序，然后将参数name添加到reset_index：

df_count = df.apply(sorted, axis=1) \
             .groupby(['source', 'target']) \
             .size() \
             .reset_index(name='weight') \
             .sort_values('weight', ascending=False)
print (df_count)
  source target  weight
0    ape   bird       2
1    ape    dog       2
4    ape   hous       2
2    ape   fist       1
3    ape   hors       1
5   bird   fist       1
6   bird   hous       1
7    dog   hors       1
8    dog   hous       1

熊猫数据帧频率

2 个答案: