Question

问题

我将csv放到数据框中，其中存在一些日期时间间隔 - 采样频率为15分钟，对于每个日期时间戳，总是有三个值的块。在此示例中，缺少日期时间2017-12-11 23:15:00的块。

         ID           Datetime   Value
0        a 2017-12-11 23:00:00   20.0
1        b 2017-12-11 23:00:00   20.9
2        c 2017-12-11 23:00:00   21.0
3        a 2017-12-11 23:30:00   19.8
4        b 2017-12-11 23:30:00   20.8
5        c 2017-12-11 23:30:00   20.8

期望的结果

我想要做的是重新采样日期时间并用零填充Value的空白：

         ID           Datetime   Value
0        a 2017-12-11 23:00:00   20.0
1        b 2017-12-11 23:00:00   20.9
2        c 2017-12-11 23:00:00   21.0
3        a 2017-12-11 23:15:00   0.0
4        b 2017-12-11 23:15:00   0.0
5        c 2017-12-11 23:15:00   0.0
6        a 2017-12-11 23:30:00   19.8
7        b 2017-12-11 23:30:00   20.8
8        c 2017-12-11 23:30:00   20.8

我的问题

是否可以使用resample()完成此操作，或者与groupby()结合使用是否可以解决？

import pandas as pd

df = pd.concat((pd.read_csv(file, parse_dates=[1], dayfirst=True, 
                    names=headers)for file in all_files))
df.set_index("Datetime").resample('15min').fillna(0).reset_index()

Answer 1

如果单个时间戳有多个值，则可以使用resample和last / average。

df.groupby('ID').resample('15min').last().fillna(0)

这将重新采样数据帧，并取每个采样周期的最后一个值（主要应为1或0值），对于没有值但是索引（时间）的情况，它将插入一个0而不是不适用。

注意，这只有在你有适当的索引类型时才有效，我看到你正在解析日期，调用df.dtypes会让你确定你有Datetime列的有效类型。我建议将索引设置为“日期时间”并将其保留在那里，如果计划根据时间进行许多/任何操作。（即，在上述命令之前执行此操作！）

df.set_index('Datetime', inplace=True)

这将导致新的MultiIndex DataFrame

Out[76]: 
                       ID  Value
ID Datetime                     
a  2018-02-26 23:00:00  a   20.0
   2018-02-26 23:15:00  0    0.0
   2018-02-26 23:30:00  a   19.8
b  2018-02-26 23:00:00  b   20.9
   2018-02-26 23:15:00  0    0.0
   2018-02-26 23:30:00  b   20.8
c  2018-02-26 23:00:00  c   21.0
   2018-02-26 23:15:00  0    0.0
   2018-02-26 23:30:00  c   20.8

如果你只是在价值系列之后，有一点动人和震动，我们最终会得到一个略有不同的数据框，只有一个索引。这样做的好处是ID列中没有奇数值（见上面的0）

(df.groupby('ID')['Value']
 .resample('15min')
 .last()
 .fillna(0)
 .reset_index()
 .set_index('Datetime')
 .sort_index())

Out[107]: 
                    ID  Value
Datetime                     
2018-02-26 23:00:00  a   20.0
2018-02-26 23:00:00  b   20.9
2018-02-26 23:00:00  c   21.0
2018-02-26 23:15:00  a    0.0
2018-02-26 23:15:00  b    0.0
2018-02-26 23:15:00  c    0.0
2018-02-26 23:30:00  a   19.8
2018-02-26 23:30:00  b   20.8
2018-02-26 23:30:00  c   20.8

Answer 2

让我们使用一些数据帧重塑然后resample和fillna，然后转换回原始数据帧结构：

df_out = (df.set_index(['Datetime','ID'])
            .unstack()
            .resample('15T')
            .asfreq()
            .fillna(0)
            .stack()
            .reset_index())

输出：

             Datetime ID  Value
0 2017-12-11 23:00:00  a   20.0
1 2017-12-11 23:00:00  b   20.9
2 2017-12-11 23:00:00  c   21.0
3 2017-12-11 23:15:00  a    0.0
4 2017-12-11 23:15:00  b    0.0
5 2017-12-11 23:15:00  c    0.0
6 2017-12-11 23:30:00  a   19.8
7 2017-12-11 23:30:00  b   20.8
8 2017-12-11 23:30:00  c   20.8

重新采样/填充日期时间戳

2 个答案: