我的df看起来如下,但更大。在lastDate列下面有一些不正确的日期,只有在correctDate列中有一些内容时,它们才会出错。
dff = pd.DataFrame(
{"lastDate":['2016-3-27', '2016-4-11', '2016-3-27', '2016-3-27', '2016-5-25', '2016-5-31'],
"fixedDate":['2016-1-3', '', '2016-1-18', '2016-4-5', '2016-2-27', ''],
"analyst":['John Doe', 'Brad', 'John', 'Frank', 'Claud', 'John Doe']
})
答案 0 :(得分:1)
首先将这些列转换为datetime dtypes:
for col in ['fixedDate', 'lastDate']:
df[col] = pd.to_datetime(df[col])
然后你可以使用
mask = pd.notnull(df['fixedDate'])
df.loc[mask, 'lastDate'] = df['fixedDate']
例如,
import pandas as pd
df = pd.DataFrame( {"lastDate":['2016-3-27', '2016-4-11', '2016-3-27', '2016-3-27', '2016-5-25', '2016-5-31'], "fixedDate":['2016-1-3', '', '2016-1-18', '2016-4-5', '2016-2-27', ''], "analyst":['John Doe', 'Brad', 'John', 'Frank', 'Claud', 'John Doe'] })
for col in ['fixedDate', 'lastDate']:
df[col] = pd.to_datetime(df[col])
mask = pd.notnull(df['fixedDate'])
df.loc[mask, 'lastDate'] = df['fixedDate']
print(df)
产量
analyst fixedDate lastDate
0 John Doe 2016-01-03 2016-01-03
1 Brad NaT 2016-04-11
2 John 2016-01-18 2016-01-18
3 Frank 2016-04-05 2016-04-05
4 Claud 2016-02-27 2016-02-27
5 John Doe NaT 2016-05-31