如何在熊猫,Python中对这些数据进行分组?

时间:2019-08-08 10:51:47

标签: pandas pandas-groupby

这是我在熊猫下面的表格。我几乎有40k行,其中有OpenTime,ClosedTime,Reopen Time作为标题。我想全部归组 三列,

       OpenTime     ClosedTime  ReopenTime   ID

0       Afternoon   Afternoon   Afternoon    484
1       Evening     Afternoon   Afternoon    44
2       Morning     Morning     Evening      23
3       Night       Evening     Evening
10,000  Morning     Afternoon   Night
12,000  Morning     Evening     Morning
40,000  Night       Morning     Night

这是我想要的结果数据。

        OpenTime  ClosedTime ReopenTime
Morning 5644         4555     4444
Night   444           333     333

这是我尝试过的一些代码,

df1 = df.groupby(['OpenTime']).size().reset_index()
df1

我得到了这个结果,这是错误的(我什至尝试在2-3列上进行GroupBy,但不确定如何去做。请帮忙,谢谢。

    OpenTime    0
0   Afternoon   16395
1   Evening 16813
2   Morning 9876
3   Night   546

1 个答案:

答案 0 :(得分:1)

使用DataFrame.melt取消GroupBy.size的旋转,使用Series.unstack进行重塑:

df2 = df.melt('ID').groupby(['value','variable']).size().unstack(fill_value=0)

或使用crosstab

df1 = df.melt('ID')
df2 = pd.crosstab(df1['value'], df1['variable'])
print (df2)
variable   ClosedTime  OpenTime  ReopenTime
value                                      
Afternoon           3         1           2
Evening             2         1           2
Morning             2         3           1
Night               0         2           2

编辑:如果需要为melting指定列:

df2 = (df.melt(value_vars=['OpenTime','ClosedTime','ReopenTime'])
         .groupby(['value','variable'])
         .size()
         .unstack(fill_value=0))

df1 = df.melt(value_vars=['OpenTime','ClosedTime','ReopenTime'])
df2 = pd.crosstab(df1['value'], df1['variable'])

详细信息

第一个熔化的unpivot数据-如果使用id_vars是标识符,则所有其他列均视为测量列(value_vars):

print (df.melt(id_vars='ID'))
       ID    variable      value
0   484.0    OpenTime  Afternoon
1    44.0    OpenTime    Evening
2    23.0    OpenTime    Morning
3     NaN    OpenTime      Night
4     NaN    OpenTime    Morning
5     NaN    OpenTime    Morning
6     NaN    OpenTime      Night
7   484.0  ClosedTime  Afternoon
8    44.0  ClosedTime  Afternoon
9    23.0  ClosedTime    Morning
10    NaN  ClosedTime    Evening
11    NaN  ClosedTime  Afternoon
12    NaN  ClosedTime    Evening
13    NaN  ClosedTime    Morning
14  484.0  ReopenTime  Afternoon
15   44.0  ReopenTime  Afternoon
16   23.0  ReopenTime    Evening
17    NaN  ReopenTime    Evening
18    NaN  ReopenTime      Night
19    NaN  ReopenTime    Morning
20    NaN  ReopenTime      Night

或者可能仅定义value_vars列:

print (df.melt(value_vars=['OpenTime','ClosedTime','ReopenTime']))
      variable      value
0     OpenTime  Afternoon
1     OpenTime    Evening
2     OpenTime    Morning
3     OpenTime      Night
4     OpenTime    Morning
5     OpenTime    Morning
6     OpenTime      Night
7   ClosedTime  Afternoon
8   ClosedTime  Afternoon
9   ClosedTime    Morning
10  ClosedTime    Evening
11  ClosedTime  Afternoon
12  ClosedTime    Evening
13  ClosedTime    Morning
14  ReopenTime  Afternoon
15  ReopenTime  Afternoon
16  ReopenTime    Evening
17  ReopenTime    Evening
18  ReopenTime      Night
19  ReopenTime    Morning
20  ReopenTime      Night

最后在variablevalue列之间创建了交叉表,以进行简单的交叉表计数。