熊猫重新采样填充NaN

时间:2020-08-06 14:38:44

标签: python-3.x pandas pandas-resample

我有这个df:

            Timestamp        List     Power    Energy     Status
0 2020-01-01 01:05:50   [5, 5, 5]      7000     15000     online
1 2020-01-01 01:06:20   [6, 6, 6]      7500     16000     online
2 2020-01-01 01:08:30   [0, 0, 0]         5         0    offline
...

不,我想对其重新采样。使用.resample如下:

df2 = df.set_index('timestamp').resample('min').?

我想要1分钟内的df-intervalls。对于每个间隔,我想与行匹配,如下所示: 列表:如果状态=在线:间隔的最后一项,否则为“ 0”; 幂:如果状态=在线:间隔的平均值,否则为'0';能量:如果状态=在线:时间间隔的最后一项,否则为'0;状态:间隔的最后状态; <​​/ p>

如果df中没有数据,我该如何填充NaN值(重采样输出)?例如。一段时间没有数据,则应按如下所示填充df功率= 0;能量= 0;状态=离线; ...

我尝试过类似的事情:

df2 = df.set_index('Timestamp').resample('T').agg({'List':'last',
                                                   'Power':'mean',
                                                   'Energy':'last',
                                                   'Status':'last'})

并得到:

         Timestamp        List                      Power    Energy     Status
0 2020-01-01 01:05   [5, 5, 5]  (average of the interval)     15000     online
1 2020-01-01 01:06   [6, 6, 6]  (average of the interval)     16000     online
2 2020-01-01 01:07         NaN                        NaN       NaN        NaN
3 2020-01-01 01:08   [0, 0, 0]                          5         0    offline

预期结果:

         Timestamp        List                      Power    Energy     Status
0 2020-01-01 01:05   [5, 5, 5]  (average of the interval)     15000     online
1 2020-01-01 01:06   [6, 6, 6]  (average of the interval)     16000     online
2 2020-01-01 01:07   [0, 0, 0]                          0         0    offline
3 2020-01-01 01:08   [0, 0, 0]                          5         0    offline

1 个答案:

答案 0 :(得分:0)

如文档https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.agg.html中所述,在 .resample()。agg()期间,无法通过fillna规则来单独处理每个列 NA 值。

就您而言,即使插值也不起作用,因此,请尝试手动处理每列 NA

首先,让我们初始化示例框架。

import pandas as pd

data = {"Timestamp":{"0": "2020-01-01 01:05:50",
                     "1": "2020-01-01 01:06:20",
                     "2": "2020-01-01 01:08:30"},
        "List": {"0": [5, 5, 5],
                 "1": [6, 6, 6],
                 "2": [0, 0, 0]},
        "Power": {"0": 7000,
                 "1": 7500,
                 "2": 5},
        "Energy": {"0": 15000,
                   "1": 16000,
                   "2": 0},
        "Status": {"0": "online",
                   "1": "online",
                   "2": "offline"},
       }

df = pd.DataFrame(data)

df['Timestamp'] = pd.to_datetime(df['Timestamp'])

df = df.set_index('Timestamp').resample('T').agg({'List':'last',
                                                   'Power':'mean',
                                                   'Energy':'last',
                                                   'Status':'last'})

现在,我们可以在每列中分别手动替换 NA

df["List"] = df["List"].fillna("[0, 0, 0]")
df["Status"] = df["Status"].fillna('offline')
df = df.fillna(0)

或更方便的字典方式

values = {
          'List': '[0, 0, 0]',
          'Status': 'offline', 
          'Power': 0, 
          'Energy': 0
}

df = df.fillna(value=values)
Timestamp   List    Power   Energy  Status
0   2020-01-01 01:05:00     [5, 5, 5]   7000.0  15000.0     online
1   2020-01-01 01:06:00     [6, 6, 6]   7500.0  16000.0     online
2   2020-01-01 01:07:00     [0, 0, 0]   0.0     0.0     offline
3   2020-01-01 01:08:00     [0, 0, 0]   5.0     0.0     offline