如何更新列值忽略是否存在

时间:2019-08-28 14:04:39

标签: python python-3.x pandas append

我有一个pandas数据框和一个列表,我想更新pandas列 如果值已经存在,则使用该列表,然后忽略该行
(e.x)

my old dataframe
  date_time           value
2018-11-01 00:00:02    100
2018-11-01 00:00:12    150
2018-11-01 00:00:22    56
2018-11-01 00:00:32    95
2018-11-01 00:00:42    700


my list:
   ["2018-11-01 00:00:02", "2018-11-01 00:00:07", "2018-11-01 00:00:12", "2018-11-01 00:00:17", "2018-11-01 00:00:22", "2018-11-01 00:00:27", "2018-11-01 00:00:32", "2018-11-01 00:00:37", "2018-11-01 00:00:42", "2018-11-01 00:00:47"]

my expected output:
   date_time           value
2018-11-01 00:00:02    100
2018-11-01 00:00:07    nan
2018-11-01 00:00:12    150
2018-11-01 00:00:17    nan
2018-11-01 00:00:22    56
2018-11-01 00:00:27    nan
2018-11-01 00:00:32    95
2018-11-01 00:00:37    nan
2018-11-01 00:00:42    700
2018-11-01 00:00:47    nan

代码:

my_list = ["2018-11-01 00:00:02", "2018-11-01 00:00:07", "2018-11-01 00:00:12", "2018-11-01 00:00:17", "2018-11-01 00:00:22", "2018-11-01 00:00:27", "2018-11-01 00:00:32", "2018-11-01 00:00:37", "2018-11-01 00:00:42", "2018-11-01 00:00:47"]
df["date_time"] = pd.Series(my_list).astype(str)

当我执行以上代码时,它会产生以下输出:

   date_time           value
2018-11-01 00:00:02    100
2018-11-01 00:00:07    150
2018-11-01 00:00:12    56
2018-11-01 00:00:17    95
2018-11-01 00:00:22    700
2018-11-01 00:00:27    nan
2018-11-01 00:00:32    nan
2018-11-01 00:00:37    nan
2018-11-01 00:00:42    nan
2018-11-01 00:00:47    nan

1 个答案:

答案 0 :(得分:0)

如果date_time是从datetimes创建list的列,则创建DatetimeIndex并使用DataFrame.reindex

df['date_time'] = pd.to_datetime(df['date_time'])
df = (df.set_index('date_time')
        .reindex(pd.to_datetime(my_list)
        .rename('date_time'))
        .reset_index())
print (df)
            date_time  value
0 2018-11-01 00:00:02  100.0
1 2018-11-01 00:00:07    NaN
2 2018-11-01 00:00:12  150.0
3 2018-11-01 00:00:17    NaN
4 2018-11-01 00:00:22   56.0
5 2018-11-01 00:00:27    NaN
6 2018-11-01 00:00:32   95.0
7 2018-11-01 00:00:37    NaN
8 2018-11-01 00:00:42  700.0
9 2018-11-01 00:00:47    NaN

或创建助手DataFrame并在左联接中使用DataFrame.merge

df['date_time'] = pd.to_datetime(df['date_time'])

df = pd.DataFrame({'date_time': pd.to_datetime(my_list)}).merge(df, how='left')
print (df)
            date_time  value
0 2018-11-01 00:00:02  100.0
1 2018-11-01 00:00:07    NaN
2 2018-11-01 00:00:12  150.0
3 2018-11-01 00:00:17    NaN
4 2018-11-01 00:00:22   56.0
5 2018-11-01 00:00:27    NaN
6 2018-11-01 00:00:32   95.0
7 2018-11-01 00:00:37    NaN
8 2018-11-01 00:00:42  700.0
9 2018-11-01 00:00:47    NaN

如果DatetimeIndex:

df.index = pd.to_datetime(df.index)
df = df.reindex(pd.to_datetime(my_list).rename('date_time'))
print (df)
                     value
date_time                 
2018-11-01 00:00:02  100.0
2018-11-01 00:00:07    NaN
2018-11-01 00:00:12  150.0
2018-11-01 00:00:17    NaN
2018-11-01 00:00:22   56.0
2018-11-01 00:00:27    NaN
2018-11-01 00:00:32   95.0
2018-11-01 00:00:37    NaN
2018-11-01 00:00:42  700.0
2018-11-01 00:00:47    NaN

或者:

df.index = pd.to_datetime(df.index)
df = pd.DataFrame({'date_time': pd.to_datetime(my_list)}).join(df, on='date_time')