我有一个pandas数据框和一个列表,我想更新pandas列
如果值已经存在,则使用该列表,然后忽略该行
(e.x)
my old dataframe date_time value 2018-11-01 00:00:02 100 2018-11-01 00:00:12 150 2018-11-01 00:00:22 56 2018-11-01 00:00:32 95 2018-11-01 00:00:42 700 my list: ["2018-11-01 00:00:02", "2018-11-01 00:00:07", "2018-11-01 00:00:12", "2018-11-01 00:00:17", "2018-11-01 00:00:22", "2018-11-01 00:00:27", "2018-11-01 00:00:32", "2018-11-01 00:00:37", "2018-11-01 00:00:42", "2018-11-01 00:00:47"] my expected output: date_time value 2018-11-01 00:00:02 100 2018-11-01 00:00:07 nan 2018-11-01 00:00:12 150 2018-11-01 00:00:17 nan 2018-11-01 00:00:22 56 2018-11-01 00:00:27 nan 2018-11-01 00:00:32 95 2018-11-01 00:00:37 nan 2018-11-01 00:00:42 700 2018-11-01 00:00:47 nan
代码:
my_list = ["2018-11-01 00:00:02", "2018-11-01 00:00:07", "2018-11-01 00:00:12", "2018-11-01 00:00:17", "2018-11-01 00:00:22", "2018-11-01 00:00:27", "2018-11-01 00:00:32", "2018-11-01 00:00:37", "2018-11-01 00:00:42", "2018-11-01 00:00:47"]
df["date_time"] = pd.Series(my_list).astype(str)
当我执行以上代码时,它会产生以下输出:
date_time value 2018-11-01 00:00:02 100 2018-11-01 00:00:07 150 2018-11-01 00:00:12 56 2018-11-01 00:00:17 95 2018-11-01 00:00:22 700 2018-11-01 00:00:27 nan 2018-11-01 00:00:32 nan 2018-11-01 00:00:37 nan 2018-11-01 00:00:42 nan 2018-11-01 00:00:47 nan
答案 0 :(得分:0)
如果date_time
是从datetimes
创建list
的列,则创建DatetimeIndex
并使用DataFrame.reindex
:
df['date_time'] = pd.to_datetime(df['date_time'])
df = (df.set_index('date_time')
.reindex(pd.to_datetime(my_list)
.rename('date_time'))
.reset_index())
print (df)
date_time value
0 2018-11-01 00:00:02 100.0
1 2018-11-01 00:00:07 NaN
2 2018-11-01 00:00:12 150.0
3 2018-11-01 00:00:17 NaN
4 2018-11-01 00:00:22 56.0
5 2018-11-01 00:00:27 NaN
6 2018-11-01 00:00:32 95.0
7 2018-11-01 00:00:37 NaN
8 2018-11-01 00:00:42 700.0
9 2018-11-01 00:00:47 NaN
或创建助手DataFrame
并在左联接中使用DataFrame.merge
:
df['date_time'] = pd.to_datetime(df['date_time'])
df = pd.DataFrame({'date_time': pd.to_datetime(my_list)}).merge(df, how='left')
print (df)
date_time value
0 2018-11-01 00:00:02 100.0
1 2018-11-01 00:00:07 NaN
2 2018-11-01 00:00:12 150.0
3 2018-11-01 00:00:17 NaN
4 2018-11-01 00:00:22 56.0
5 2018-11-01 00:00:27 NaN
6 2018-11-01 00:00:32 95.0
7 2018-11-01 00:00:37 NaN
8 2018-11-01 00:00:42 700.0
9 2018-11-01 00:00:47 NaN
如果DatetimeIndex:
df.index = pd.to_datetime(df.index)
df = df.reindex(pd.to_datetime(my_list).rename('date_time'))
print (df)
value
date_time
2018-11-01 00:00:02 100.0
2018-11-01 00:00:07 NaN
2018-11-01 00:00:12 150.0
2018-11-01 00:00:17 NaN
2018-11-01 00:00:22 56.0
2018-11-01 00:00:27 NaN
2018-11-01 00:00:32 95.0
2018-11-01 00:00:37 NaN
2018-11-01 00:00:42 700.0
2018-11-01 00:00:47 NaN
或者:
df.index = pd.to_datetime(df.index)
df = pd.DataFrame({'date_time': pd.to_datetime(my_list)}).join(df, on='date_time')