我有这个df:
Timestamp List Power Energy Status
0 2020-01-01 01:05:50 [5, 5, 5] 7000 15000 online
1 2020-01-01 01:06:20 [6, 6, 6] 7500 16000 online
2 2020-01-01 01:08:30 [0, 0, 0] 5 0 offline
...
不,我想对其重新采样。使用.resample如下:
df2 = df.set_index('timestamp').resample('min').?
我想要1分钟内的df-intervalls。对于每个间隔,我想与行匹配,如下所示: 列表:如果状态=在线:间隔的最后一项,否则为“ 0”; 幂:如果状态=在线:间隔的平均值,否则为'0';能量:如果状态=在线:时间间隔的最后一项,否则为'0;状态:间隔的最后状态; </ p>
如果df中没有数据,我该如何填充NaN值(重采样输出)?例如。一段时间没有数据,则应按如下所示填充df功率= 0;能量= 0;状态=离线; ...
我尝试过类似的事情:
df2 = df.set_index('Timestamp').resample('T').agg({'List':'last',
'Power':'mean',
'Energy':'last',
'Status':'last'})
并得到:
Timestamp List Power Energy Status
0 2020-01-01 01:05 [5, 5, 5] (average of the interval) 15000 online
1 2020-01-01 01:06 [6, 6, 6] (average of the interval) 16000 online
2 2020-01-01 01:07 NaN NaN NaN NaN
3 2020-01-01 01:08 [0, 0, 0] 5 0 offline
预期结果:
Timestamp List Power Energy Status
0 2020-01-01 01:05 [5, 5, 5] (average of the interval) 15000 online
1 2020-01-01 01:06 [6, 6, 6] (average of the interval) 16000 online
2 2020-01-01 01:07 [0, 0, 0] 0 0 offline
3 2020-01-01 01:08 [0, 0, 0] 5 0 offline
答案 0 :(得分:0)
如文档https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.agg.html中所述,在 .resample()。agg()期间,无法通过fillna规则来单独处理每个列 NA 值。
就您而言,即使插值也不起作用,因此,请尝试手动处理每列 NA 值
首先,让我们初始化示例框架。
import pandas as pd
data = {"Timestamp":{"0": "2020-01-01 01:05:50",
"1": "2020-01-01 01:06:20",
"2": "2020-01-01 01:08:30"},
"List": {"0": [5, 5, 5],
"1": [6, 6, 6],
"2": [0, 0, 0]},
"Power": {"0": 7000,
"1": 7500,
"2": 5},
"Energy": {"0": 15000,
"1": 16000,
"2": 0},
"Status": {"0": "online",
"1": "online",
"2": "offline"},
}
df = pd.DataFrame(data)
df['Timestamp'] = pd.to_datetime(df['Timestamp'])
df = df.set_index('Timestamp').resample('T').agg({'List':'last',
'Power':'mean',
'Energy':'last',
'Status':'last'})
现在,我们可以在每列中分别手动替换 NA
df["List"] = df["List"].fillna("[0, 0, 0]")
df["Status"] = df["Status"].fillna('offline')
df = df.fillna(0)
或更方便的字典方式
values = {
'List': '[0, 0, 0]',
'Status': 'offline',
'Power': 0,
'Energy': 0
}
df = df.fillna(value=values)
Timestamp List Power Energy Status
0 2020-01-01 01:05:00 [5, 5, 5] 7000.0 15000.0 online
1 2020-01-01 01:06:00 [6, 6, 6] 7500.0 16000.0 online
2 2020-01-01 01:07:00 [0, 0, 0] 0.0 0.0 offline
3 2020-01-01 01:08:00 [0, 0, 0] 5.0 0.0 offline