我有一个看起来像这样的数据框:
boat_type boat_type_2
Not Known Not Known
Not Known kayak
ship Not Known
Not Known Not Known
ship Not Known
我想创建第三列boat_type_final
,其内容应如下所示:
boat_type boat_type_2 boat_type_final
Not Known Not Known cruise
Not Known kayak kayak
ship Not Known ship
Not Known Not Known cruise
ship Not Known ship
因此,基本上,如果boat_type
和boat_type_2
中都存在“未知”,则该值应为“巡航”。但是,如果在前两列中有一个不是“未知”的字符串,则boat_type_final
应该用该字符串填充,即“皮划艇”或“船”。
最优雅的方法是什么?我已经看到了多个选项,例如where
,创建函数和/或逻辑,而且我想知道真正的pythonista会做什么。
到目前为止,这是我的代码:
import pandas as pd
import numpy as np
data = [{'boat_type': 'Not Known', 'boat_type_2': 'Not Known'},
{'boat_type': 'Not Known', 'boat_type_2': 'kayak'},
{'boat_type': 'ship', 'boat_type_2': 'Not Known'},
{'boat_type': 'Not Known', 'boat_type_2': 'Not Known'},
{'boat_type': 'ship', 'boat_type_2': 'Not Known'}]
df = pd.DataFrame(data
df['phone_type_final'] = np.where(df.phone_type.str.contains('Not'))...
答案 0 :(得分:4)
使用:
df['boat_type_final'] = (df.replace('Not Known',np.nan)
.ffill(axis=1)
.iloc[:, -1]
.fillna('cruise'))
print (df)
boat_type boat_type_2 boat_type_final
0 Not Known Not Known cruise
1 Not Known kayak kayak
2 ship Not Known ship
3 Not Known Not Known cruise
4 ship Not Known ship
说明:
第一个replace
Not Known
缺失值:
print (df.replace('Not Known',np.nan))
boat_type boat_type_2
0 NaN NaN
1 NaN kayak
2 ship NaN
3 NaN NaN
4 ship NaN
然后通过向前填充每行来替换NaN
:
print (df.replace('Not Known',np.nan).ffill(axis=1))
boat_type boat_type_2
0 NaN NaN
1 NaN kayak
2 ship ship
3 NaN NaN
4 ship ship
按iloc
按位置选择最后一列:
print (df.replace('Not Known',np.nan).ffill(axis=1).iloc[:, -1])
0 NaN
1 kayak
2 ship
3 NaN
4 ship
Name: boat_type_2, dtype: object
如果可能的话,NaN
添加fillna
:
print (df.replace('Not Known',np.nan).ffill(axis=1).iloc[:, -1].fillna('cruise'))
0 cruise
1 kayak
2 ship
3 cruise
4 ship
Name: boat_type_2, dtype: object
如果仅使用几列,则使用另一种解决方案:numpy.select
:
m1 = df['boat_type'] == 'ship'
m2 = df['boat_type_2'] == 'kayak'
df['boat_type_final'] = np.select([m1, m2], ['ship','kayak'], default='cruise')
print (df)
boat_type boat_type_2 boat_type_final
0 Not Known Not Known cruise
1 Not Known kayak kayak
2 ship Not Known ship
3 Not Known Not Known cruise
4 ship Not Known ship
答案 1 :(得分:2)
另一种解决方案是在具有映射的位置定义函数:
def my_func(row):
if row['boat_type']!='Not Known':
return row['boat_type']
elif row['boat_type_2']!='Not Known':
return row['boat_type_2']
else:
return 'cruise'
[注意:您没有提到当两列都不为'Unknown'时会发生什么。]
然后只需应用以下功能:
df.loc[:,'boat_type_final'] = df.apply(my_func, axis=1)
print(df)
输出:
boat_type boat_type_2 boat_type_final
0 Not Known Not Known cruise
1 Not Known kayak kayak
2 ship Not Known ship
3 Not Known Not Known cruise
4 ship Not Known ship