根据条件填充df的NaN-值并添加行

时间:2020-08-18 07:25:04

标签: python pandas dataframe nan resampling

我有这个df:

df = pd.DataFrame({"Time": ["2020-04-09 06:40:40.559719","2020-04-09 06:40:40.559719", 'NaT', "2020-04-09 06:40:40.559719", 'NaT', 'NaT', 'NaT', '2020-04-09 16:50:38.559871', 'NaT', '2020-04-29 16:50:38.559871'],
          "Power": [7500, 6000, 'NaN', 6000, 'NaN', 'NaN', 'NaN', 3600, 'NaN', 4200],
          "Total Energy": [5000, 5100, 'NaN', 5300, 'NaN', 'NaN', 'NaN', 5360, 'NaN', 5500],
          "ID": [1, 1, 'NaN', 1, 'NaN', 'NaN', 'NaN', 2, 'NaN', 2],
          "Energy": [500, 600, 'NaN', 800, 'NaN', 'NaN', 'NaN', 60, 'NaN', 200]},
          index=pd.date_range(start = "2020-04-09 6:45", periods = 10, freq = 'T'))

df['Time'] = pd.to_datetime(df['Time'])
df['Power'] = pd.to_numeric(df['Power'], errors = 'coerce')
df['Total Energy'] = pd.to_numeric(df['Total Energy'], errors = 'coerce')
df['ID'] = pd.to_numeric(df['ID'], errors = 'coerce')
df['Energy'] = pd.to_numeric(df['Energy'], errors = 'coerce')

df

输出:

                                          Time   Power  Total Energy     ID Energy
2020-04-09 06:45:00 2020-04-09 06:40:40.559719  7500.0        5000.0    1.0  500.0
2020-04-09 06:46:00 2020-04-09 06:40:40.559719  6000.0        5100.0    1.0  600.0
2020-04-09 06:47:00                        NaT     NaN           NaN    NaN    NaN
2020-04-09 06:48:00 2020-04-09 06:40:40.559719  6000.0        5300.0    1.0  800.0
2020-04-09 06:49:00                        NaT     NaN           NaN    NaN    NaN
2020-04-09 06:50:00                        NaT     NaN           NaN    NaN    NaN
2020-04-09 06:51:00                        NaT     NaN           NaN    NaN    NaN
2020-04-09 06:52:00 2020-04-09 16:50:38.559871  3600.0        5360.0    2.0   60.0
2020-04-09 06:53:00                        NaT     NaN           NaN    NaN    NaN
2020-04-09 06:54:00 2020-04-29 16:50:38.559871  4200.0        5500.0    2.0  200.0

现在我想根据不同的条件填写NaN / NaT值,并在缺少df时添加一些行:

  1. df ['Time']:创建新行,直到df ['Timestamp'] = df ['Time']
  2. 填充新行:第一行df ['Energy'] = 0,而不是线性填充;第一行的df ['Power'] = 0,而不是df ['Power'] = df ['Energy'] /(1/60); df ['Time']和df ['ID']用bfill()填充; df ['Total Energy'] = df ['Energy']的总和
  3. 两个不同时间之间的界线:按预期结果填充
  4. 时间序列中的NaN值(例如@ 2020-04-09 06:47:00):带有ffill()的df ['Time']和df ['ID']; df ['Energy'] =现有线之间的差异(如果有更多的NaN线->线性插入); df ['Total Energy'] =旧值+ df ['Energy']; df ['Power'] = df ['Energy'] /(1/60)

预期输出:

                                         Time     Power    Total Energy    ID  Energy
2020-04-09 06:41:00 2020-04-09 06:40:40.559719        0          4500.0   1.0       0
2020-04-09 06:42:00 2020-04-09 06:40:40.559719   7500.0          4625.0   1.0   125.0
2020-04-09 06:43:00 2020-04-09 06:40:40.559719   7500.0          4750.0   1.0   250.0
2020-04-09 06:44:00 2020-04-09 06:40:40.559719   7500.0          4875.0   1.0   375.0
2020-04-09 06:45:00 2020-04-09 06:40:40.559719   7500.0          5000.0   1.0   500.0
2020-04-09 06:46:00 2020-04-09 06:40:40.559719   6000.0          5100.0   1.0   600.0
2020-04-09 06:47:00 2020-04-09 06:40:40.559719   6000.0          5200.0   1.0   700.0
2020-04-09 06:48:00 2020-04-09 06:40:40.559719   6000.0          5300.0   1.0   800.0
2020-04-09 06:49:00 -                                 0          5300.0     -       0
2020-04-09 06:50:00 -                                 0          5300.0     -       0
2020-04-09 06:51:00 2020-04-09 16:50:38.559871        0          5300.0   2.0       0
2020-04-09 06:52:00 2020-04-09 16:50:38.559871   3600.0          5360.0   2.0    60.0
2020-04-09 06:53:00 2020-04-09 16:50:38.559871   4200.0          5430.0   2.0   130.0
2020-04-09 06:54:00 2020-04-29 16:50:38.559871   4200.0          5500.0   2.0   200.0

感谢您的帮助! :)

0 个答案:

没有答案