我正在使用pandas根据条件和现有列中值的比较在数据框中添加其他列。
这是原始数据帧:
start end Sold
0 NA
1 2017-05-08 2017-09-08 Yes
2 2018-09-01 2017-09-01 Yes
这是我想要的数据框:
start end Sold valid
0 NA Unknown
1 2017-05-08 2017-09-08 Yes True
2 2018-09-01 2017-09-01 Yes False
基本上,新的有效列由所有3个现有列定义。
Condition 1
:如果出售的是NA,则有效等于未知。
Condition 2
:如果销售的不是NA,则开始日期<结束日期,有效等于True。有效等于False的商品不是NA,开始日期>结束日期。
有人可以建议一段代码吗?
答案 0 :(得分:0)
使用numpy.select
:
#convert to datetimes both columns
df['start'] = pd.to_datetime(df['start'], errors='coerce')
df['end'] = pd.to_datetime(df['end'], errors='coerce')
df['valid'] = np.select([df['Sold'] == 'NA',
df['start'] < df['end'],
df['start'] > df['end']], ['Unknown', True, False])
如果Na
缺少值NaN
,则Series.isna
测试值:
df['valid'] = np.select([df['Sold'].isna(),
df['start'] < df['end'],
df['start'] > df['end']], ['Unknown', True, False])
print (df)
start end Sold valid
0 NaT NaT NA Unknown
1 2017-05-08 2017-09-08 Yes True
2 2018-09-01 2017-09-01 Yes False
答案 1 :(得分:0)
替代..使用np.where
df['valid'] = np.where(df['Sold'] == 'NA', 'Unknown',
np.where((df['Sold'] != 'NA') & (df['start'] < df['end']),True,
np.where((df['Sold'] != 'NA') & (df['start'] > df['end']), False, '')))
如果不存在NaN
df['valid'] = np.where(df['Sold'].isnull()==True, 'Unknown',
np.where((df['Sold'] != 'NA') & (df['start'] < df['end']),True,
np.where((df['Sold'] != 'NA') & (df['start'] > df['end']), False, '')))
输出
start end Sold valid
0 NA Unknown
1 2017-05-08 2017-09-08 Yes True
2 2018-09-01 2017-09-01 Yes False