Question

我有一个带多个变量的csv文件。
在变量中，日期和时间是分别包含的。
下图是我的数据形式：

  Date         Time       Axis1     Axis2    Axis3
   .             .         .          .       .
   .             .         .          .       .
2017-10-15    13:40:00     20         0       40
2017-10-15    13:40:10     40         10      100
2017-10-15    13:40:20     50         0       0
2017-10-15    13:40:30     10         10      60
2017-10-15    13:40:40     0          0       20
2017-10-15    13:40:50     0          0       10
2017-10-16    06:20:30     10         0       10
2017-10-16    06:20:40     70         0       10
2017-10-16    06:20:50     20         100     80
   .             .         .          .       .
   .             .         .          .       .

行数更多（超过一万）
您可能会注意到10/15和10/16之间有一些时间间隔。
我想按分钟对所有三个Axis值求和。
我期望的是这种结构：

  Date         Time       Axis1     Axis2    Axis3
   .             .         .          .       .
   .             .         .          .       .
2017-10-15    13:40:00     120        20      230
2017-10-16    06:20:00     100        100     100
2017-10-16    06:21:00     ?          ?       ?
   .             .         .          .       .
   .             .         .          .       .

我尝试使用groupby，resample和pd.Grouper，但这对我不起作用。
主要问题是时间索引不是从13:40:00开始，而是从00:00:00 开始，在我将时间作为索引并使用groupby（'Date'）之后并重新采样（'1Min'）。sum（）。

感谢您的帮助！

Answer 1

让我们尝试一下：

df = df.set_index(pd.to_datetime(df['Date']+' '+df['Time'], format='%Y-%m-%d %H:%M:%S'))

df.groupby(df.index.floor('T')).sum()

输出：

                     Axis1  Axis2  Axis3
2017-10-15 13:40:00    120     20    230
2017-10-16 06:20:00    100    100    100

注意：使用format中的pd.to_datetime参数来提高效率。使用floor可以避免重新采样或对丢失的时间进行分组。

如何在python中使用groupby处理时间索引

1 个答案: