pandas:在时间戳上合并行

时间:2013-01-25 00:56:37

标签: python merge pandas rows

我的数据如下:

 date, cola, colb, colc
 1,10,,
 2,11,,
 3,12,,
 4,13,,
 1,,14,
 2,,15,
 3,,16,
 4,,17,
 1,,,17
 2,,,18
 3,,,19
 4,13,,20

我想根据第一列合并行,输出如下:

 date, cola, colb, colc
 1,10,14,17
 2,11,15,18
 3,12,16,19
 4,13,17,20

我不能保证不会有任何冲突,所以我希望能够选择最大值或平均值。

1 个答案:

答案 0 :(得分:1)

您可以使用groupby。从具有重复项的csv开始:

>>> !cat tomerge.csv
date, cola, colb, colc
1,10,,
2,11,,
1,,14,
2,,15,
1,,24,
2,,40,
1,,,17
2,,,18

阅读:

>>> df = pd.read_csv("tomerge.csv")
>>> df
   date   cola   colb   colc
0     1     10    NaN    NaN
1     2     11    NaN    NaN
2     1    NaN     14    NaN
3     2    NaN     15    NaN
4     1    NaN     24    NaN
5     2    NaN     40    NaN
6     1    NaN    NaN     17
7     2    NaN    NaN     18

然后神奇的事情发生了:

>>> df.groupby("date").mean()
       cola   colb   colc
date                     
1        10   19.0     17
2        11   27.5     18
>>> df.groupby("date").max()
       cola   colb   colc
date                     
1        10     24     17
2        11     40     18