填充df行

时间:2020-08-19 14:25:31

标签: python pandas dataframe row fill

我有这个df:

import pandas as pd

df = pd.DataFrame({"Time": ["2020-04-09 04:40:40.559719","2020-04-09 04:40:40.559719", "2020-04-09 04:40:40.559719", 'NaN', 'NaN', 'NaN', 'NaN', 'NaN', '2020-04-29 16:50:38.559871'],
              "Power": [7500, 6000, 6000, 0, 0, 0, 0, 0, 4200],
              "Total Energy": [5000, 5100, 5300, 5300, 5300, 5300, 5300, 5300, 5500],
              "ID": [1, 1, 1, '-', '-', '-', '-', '-', 2],
              "Energy": [500, 600, 800, 0, 0, 0, 0, 0, 200]},
              index=pd.date_range(start = "2020-04-09 6:45", periods = 9, freq = 'T'))

df['Time'] = pd.to_datetime(df['Time'])
df['Time'] = df['Time'].dt.tz_localize('Europe/Berlin')
df['Power'] = pd.to_numeric(df['Power'], errors = 'ignore')
df['Total Energy'] = pd.to_numeric(df['Total Energy'], errors = 'coerce')
df['ID'] = pd.to_numeric(df['ID'], errors = 'coerce')
df['Energy'] = pd.to_numeric(df['Energy'], errors = 'coerce')

df

输出:

                                                Time     Power  Total Energy      ID    Energy
2020-04-09 06:45:00 2020-04-09 04:40:40.559719+02:00    7500.0        5000.0     1.0     500.0
2020-04-09 06:46:00 2020-04-09 04:40:40.559719+02:00    6000.0        5100.0     1.0     600.0
2020-04-09 06:47:00 2020-04-09 04:40:40.559719+02:00    6000.0        5300.0     1.0     800.0
2020-04-09 06:48:00                              NaT         0        5300.0        -        0
2020-04-09 06:49:00                              NaT         0        5300.0        -        0
2020-04-09 06:50:00                              NaT         0        5300.0        -        0
2020-04-09 06:51:00                              NaT         0        5300.0        -        0
2020-04-09 06:52:00                              NaT         0        5300.0        -        0
2020-04-09 06:53:00 2020-04-29 04:50:38.559871+02:00    4200.0        5500.0     2.0     200.0

我必须做两件事:

  1. df ['Time']应该采用与索引相同的格式,并且时区应包含在datetime对象中(2020-04-09 04:40:40.559719 + 02:00-> 2020-04- 09 6:40:40)
  2. df ['Time']
  3. 它们之间的行应按以下方式填充:Time = Time,Energy('Time')-1 = Power('Time')* 1/60,Energy('Time')-n:线性填充(初始value = 0,最终值= Energy('Time')-1),Total_Energy = + Energy;功率=能量/(1/60); ID = ID('时间')

预期结果:

                                                Time     Power  Total Energy      ID    Energy
2020-04-09 06:45:00              2020-04-09 06:40:40    7500.0        5000.0     1.0     500.0
2020-04-09 06:46:00              2020-04-09 06:40:40    6000.0        5100.0     1.0     600.0
2020-04-09 06:47:00              2020-04-09 06:40:40    6000.0        5300.0     1.0     800.0
2020-04-09 06:48:00                              NaT         0        5300.0       -         0
2020-04-09 06:49:00                              NaT         0        5300.0       -         0
2020-04-09 06:50:00                              NaT         0        5300.0       -         0
2020-04-09 06:51:00              2020-04-29 06:50:38         0        5300.0     2.0         0
2020-04-09 06:52:00              2020-04-29 06:50:38    7800.0        5400.0     2.0     130.0
2020-04-09 06:53:00              2020-04-29 06:50:38    4200.0        5500.0     2.0     200.0

感谢您的帮助:)

0 个答案:

没有答案