将类似的行组合到python数据帧

时间:2016-01-02 16:01:55

标签: python sql pandas dataframe pandasql

我有一些数据框如下所示,我想要做的是将行组合起来 " YYYYMMDD"和" hr"成一排。 (有几行相同" yyyymmdd"和" hr")

       yyyymmdd  hr ariel cat kiki mmax vicky gaolie shiu nick ck
10   2015-12-27   9     0   0    0    0     0      0    0   23  0
181  2015-12-27  10     0   0    0    0     0      0    0    2  0
65   2015-12-27  11     0   0    0    0     0      0    0   20  0
4    2015-12-27  12     0   0    0    0     0      0    0    4  0
0    2015-12-27  17     0   0    0    0     0      0    0    2  0
141  2015-12-27  19     1   0    0    0     0      0    0    0  0
160  2015-12-28   8     0   8    0    0     0      0    0    0  0
82   2015-12-28   9     0   0    0    0     0      0   19    0  0
113  2015-12-28   9    11   0    0    0     0      0    0    0  0
180  2015-12-28   9     0  11    0    0     0      0    0    0  0
9    2015-12-28  10     0  13    0    0     0      0    0    0  0
76   2015-12-28  10    85   0    0    0     0      0    0    0  0
107  2015-12-28  10     0   0    0    0     0      0   15    0  0
188  2015-12-28  10     0   0    0    0     2      0    0    0  0
34   2015-12-28  11     0   0    0    0     0      0   14    0  0
69   2015-12-28  11     0   0    0    0     2      0    0    0  0
134  2015-12-28  11     0  11    0    0     0      0    0    0  0
158  2015-12-28  11     2   0    0    0     0      0    0    0  0

我想要的部分输出应该是这样的:

    yyyymmdd  hr ariel cat kiki mmax vicky gaolie shiu nick ck
2015-12-28  10     85   13    0    0     2      0    15    0  0

请分享一些我可以在python pandas或SQL中使用的想法,谢谢!

=============================================== ==========================

现在我还有两个问题要问:

  1. 我怎样才能填充" " hr"数据帧的索引? 它应该是这样的:

    yyyymmdd hr ariel cat kiki mmax vicky gaolie shiu nick ck 0 2015-12-27 8 NaN NaN NaN NaN NaN NaN NaN NaN NaN 1 2015-12-27 9 0 0 0 0 0 0 0 23 0 2 2015-12-27 10 0 0 0 0 0 0 0 2 0 3 2015-12-27 11 0 0 0 0 0 0 0 20 0 4 2015-12-27 12 0 0 0 0 0 0 0 4 0 5 2015-12-27 13 NaN NaN NaN NaN NaN NaN NaN NaN NaN 6 2015-12-27 14 NaN NaN NaN NaN NaN NaN NaN NaN NaN 7 2015-12-27 15 NaN NaN NaN NaN NaN NaN NaN NaN NaN 8 2015-12-27 16 NaN NaN NaN NaN NaN NaN NaN NaN NaN 9 2015-12-27 17 0 0 0 0 0 0 0 2 0 10 2015-12-27 18 NaN NaN NaN NaN NaN NaN NaN NaN NaN 11 2015-12-27 19 1 0 0 0 0 0 0 0 0 12 2015-12-27 20 NaN NaN NaN NaN NaN NaN NaN NaN NaN 13 2015-12-28 8 0 8 0 0 0 0 0 0 0 14 2015-12-28 9 11 11 0 0 0 0 19 0 0 15 2015-12-28 10 85 13 0 0 2 0 15 0 0 16 2015-12-28 11 2 11 0 0 2 0 14 0 0 17 2015-12-28 12 2 20 0 4 0 0 10 0 0 18 2015-12-28 13 8 9 0 9 3 0 9 0 0 19 2015-12-28 14 4 10 0 8 0 0 22 0 0 20 2015-12-28 15 3 3 0 2 0 0 16 0 0 21 2015-12-28 16 14 5 1 1 0 0 19 0 0 22 2015-12-28 17 15 1 2 0 0 0 19 0 0 23 2015-12-28 18 0 0 0 6 0 0 0 0 0 24 2015-12-28 19 0 0 0 5 0 0 0 0 0 25 2015-12-28 20 0 0 0 1 0 0 0 0 0

  2. 如何根据列和小时绘制折线图? (x轴=列,即:ariel,cat,kiki ......) (y轴= hr,即:8,9,10 ... 20) 每个图表代表一个数据(即2015-12-27,2015-12-28 ..)

  3. 谢谢!

1 个答案:

答案 0 :(得分:1)

将您的数据放入Pandas数据帧,然后分组并获取每个组的最大值, 复制 - 将您的示例粘贴到csv中,它看起来像这样:

import pandas as pd
df = pd.read_csv('df.csv',index_col=0)
df_combined = df.groupby(['yyyymmdd','hr']).max()
df_combined

输出:

enter image description here

如果您不想要多索引,请使用reset_index()。