我正在寻找一种方法来计算一列中的值的数量,并证明它比我原先想象的要复杂。
Percentile Percentile1 Percentile2 Percentile3
0 mediocre contender contender mediocre
69 mediocre bad mediocre mediocre
117 mediocre mediocre mediocre mediocre
144 mediocre none mediocre contender
171 mediocre mediocre contender mediocre
我试图创建类似于以下输出的内容。它需要四个选项并按列计算。它本质上是每列的pd.value.counts。任何帮助肯定会受到赞赏。
Percentile Percentile1 Percentile2 Percentile3
mediocre: 5 2 3 4
contender: 0 1 2 1
bad: 0 1 0 0
none: 0 1 0 0
答案 0 :(得分:8)
它有助于使您的数据“整洁”#34;先(PDF)。这意味着列应代表变量,行应代表观察。
In [98]: df
Out[98]:
Percentile Percentile1 Percentile2 Percentile3
0 mediocre contender contender mediocre
69 mediocre bad mediocre mediocre
117 mediocre mediocre mediocre mediocre
144 mediocre none mediocre contender
171 mediocre mediocre contender mediocre
[5 rows x 4 columns]
在这种情况下,melting DataFrame使其整洁:
In [125]: melted = pd.melt(df); melted
Out[125]:
variable value
0 Percentile mediocre
1 Percentile mediocre
2 Percentile mediocre
3 Percentile mediocre
4 Percentile mediocre
5 Percentile1 contender
6 Percentile1 bad
7 Percentile1 mediocre
8 Percentile1 none
9 Percentile1 mediocre
10 Percentile2 contender
11 Percentile2 mediocre
12 Percentile2 mediocre
13 Percentile2 mediocre
14 Percentile2 contender
15 Percentile3 mediocre
16 Percentile3 mediocre
17 Percentile3 mediocre
18 Percentile3 contender
19 Percentile3 mediocre
[20 rows x 2 columns]
然后使用crosstab制作频率表:
In [127]: pd.crosstab(index=[melted['value']], columns=[melted['variable']])
Out[127]:
variable Percentile Percentile1 Percentile2 Percentile3
value
bad 0 1 0 0
contender 0 1 2 1
mediocre 5 2 3 4
none 0 1 0 0
[4 rows x 4 columns]