Question

我有一个类似的数据框：

   actual  prediction
0       1           0
1       0           0
2       0           0  
3       1           0 
4       1           1
5       0           0

是否有一种pythonic方法可以得到类似的结果：

number of (0, 0) = 3
number of (0, 1) = 0
number of (1, 0) = 2
number of (1, 1) = 1

我完全不需要它，我有几个版本的代码可以实现这一点，但它似乎太冗长了。什么是pythonic方式来获得这个？

Answer 1

Pandas解决方案（与@Divakar的紧凑型Numpy解决方案相比并不是那么好）：

from itertools import product

In [291]: cats = ['{0[0]}{0[1]}'.format(tup) for tup in product([0,1], [0,1])]

In [292]: pd.Categorical((df.actual.astype(str)+df.prediction.astype(str)),
                         categories=cats) \
            .value_counts()
Out[292]:
00    3
01    0
10    2
11    1
dtype: int64

如果您不需要列出缺少的组合，例如(0, 1)：

In [298]: df.groupby(df.columns.tolist()).size().reset_index()
Out[298]:
   actual  prediction  0
0       0           0  3
1       1           0  2
2       1           1  1

Answer 2

如果我们正在处理0s和1s，这是dot-product的一种方式 -

np.bincount(df.dot([2,1]))

Answer 3

添加自定义类别应该有效：

df = pd.DataFrame({"actual":[0,0,0,1,2,3],"prediction":[0,0,1,2,15,14]})
df['customCategory'] = (df.actual.apply(lambda x: str(x)+',')+df.prediction.astype(str))
df.groupby('customCategory').customCategory.count()

熊猫：计算两列之间匹配的最佳方法？

3 个答案: