这是我在熊猫下面的表格。我几乎有40k行,其中有OpenTime,ClosedTime,Reopen Time作为标题。我想全部归组 三列,
OpenTime ClosedTime ReopenTime ID
0 Afternoon Afternoon Afternoon 484
1 Evening Afternoon Afternoon 44
2 Morning Morning Evening 23
3 Night Evening Evening
10,000 Morning Afternoon Night
12,000 Morning Evening Morning
40,000 Night Morning Night
这是我想要的结果数据。
OpenTime ClosedTime ReopenTime
Morning 5644 4555 4444
Night 444 333 333
这是我尝试过的一些代码,
df1 = df.groupby(['OpenTime']).size().reset_index()
df1
我得到了这个结果,这是错误的(我什至尝试在2-3列上进行GroupBy
,但不确定如何去做。请帮忙,谢谢。
OpenTime 0
0 Afternoon 16395
1 Evening 16813
2 Morning 9876
3 Night 546
答案 0 :(得分:1)
使用DataFrame.melt
取消GroupBy.size
的旋转,使用Series.unstack
进行重塑:
df2 = df.melt('ID').groupby(['value','variable']).size().unstack(fill_value=0)
或使用crosstab
:
df1 = df.melt('ID')
df2 = pd.crosstab(df1['value'], df1['variable'])
print (df2)
variable ClosedTime OpenTime ReopenTime
value
Afternoon 3 1 2
Evening 2 1 2
Morning 2 3 1
Night 0 2 2
编辑:如果需要为melting
指定列:
df2 = (df.melt(value_vars=['OpenTime','ClosedTime','ReopenTime'])
.groupby(['value','variable'])
.size()
.unstack(fill_value=0))
df1 = df.melt(value_vars=['OpenTime','ClosedTime','ReopenTime'])
df2 = pd.crosstab(df1['value'], df1['variable'])
详细信息:
第一个熔化的unpivot
数据-如果使用id_vars
是标识符,则所有其他列均视为测量列(value_vars
):
print (df.melt(id_vars='ID'))
ID variable value
0 484.0 OpenTime Afternoon
1 44.0 OpenTime Evening
2 23.0 OpenTime Morning
3 NaN OpenTime Night
4 NaN OpenTime Morning
5 NaN OpenTime Morning
6 NaN OpenTime Night
7 484.0 ClosedTime Afternoon
8 44.0 ClosedTime Afternoon
9 23.0 ClosedTime Morning
10 NaN ClosedTime Evening
11 NaN ClosedTime Afternoon
12 NaN ClosedTime Evening
13 NaN ClosedTime Morning
14 484.0 ReopenTime Afternoon
15 44.0 ReopenTime Afternoon
16 23.0 ReopenTime Evening
17 NaN ReopenTime Evening
18 NaN ReopenTime Night
19 NaN ReopenTime Morning
20 NaN ReopenTime Night
或者可能仅定义value_vars
列:
print (df.melt(value_vars=['OpenTime','ClosedTime','ReopenTime']))
variable value
0 OpenTime Afternoon
1 OpenTime Evening
2 OpenTime Morning
3 OpenTime Night
4 OpenTime Morning
5 OpenTime Morning
6 OpenTime Night
7 ClosedTime Afternoon
8 ClosedTime Afternoon
9 ClosedTime Morning
10 ClosedTime Evening
11 ClosedTime Afternoon
12 ClosedTime Evening
13 ClosedTime Morning
14 ReopenTime Afternoon
15 ReopenTime Afternoon
16 ReopenTime Evening
17 ReopenTime Evening
18 ReopenTime Night
19 ReopenTime Morning
20 ReopenTime Night
最后在variable
和value
列之间创建了交叉表,以进行简单的交叉表计数。