我需要智能地组合数据框中的三列值,如下所示。该代码需要选择为True
的第一种类型的预测,即使另一个后续预测也为True
,也只能选择第一种。如果所有预测都不是True
,则返回的值应为NaN
。
index name t1 t1_check t2 t2_check t3 t3_check
----------------------------------------------------------------------------
0 cow animal True phone False fruit False
1 apple animal False fruit True food True
2 carrot vehicle False veg True animal False
3 dog pet True animal True object False
4 horse window False object False animal True
5 car pet False food False fruit False
这是我尝试过的:
首先,我将两个相关的列合并,并删除了旧列。
In:
df['t1_comb'] = str(df['t1']) + str(df['t1_check'])
df['t2_comb'] = str(df['t2']) + str(df['t2_check'])
df['t3_comb'] = str(df['t3']) + str(df['t3_check'])
df.drop(columns=['t1', 't1_check', 't2', 't2_check', 't3', 't3_check'], inplace=True)
Out:
index name t1_comb t2_comb t3_comb
---------------------------------------------------------------
0 cow animalTrue phoneFalse fruitFalse
1 apple animalFalse fruitTrue foodTrue
2 carrot vehicleFalse vegTrue animalFalse
3 dog petTrue animalTrue objectFalse
4 horse windowFalse objectFalse animalTrue
5 car petFalse foodFalse fruitFalse
然后,我尝试用False
替换所有包含NaN
的条目,并从每个条目中删除True
字符串。
In:
df.loc[df['t1_comb'].str.contains('False'), 't1_comb'] = np.nan
df.loc[df['t2_comb'].str.contains('False'), 't2_comb'] = np.nan
df.loc[df['t3_comb'].str.contains('False'), 't3_comb'] = np.nan
df.t1_comb = df.t1_comb.str.replace('True', '')
df.t2_comb = df.t2_comb.str.replace('True', '')
df.t3_comb = df.t3_comb.str.replace('True', '')
Out:
index name t1_comb t2_comb t3_comb
---------------------------------------------------------------
0 cow animal NaN NaN
1 apple NaN fruit food
2 carrot NaN veg NaN
3 dog pet animal NaN
4 horse NaN NaN animal
5 car NaN NaN NaN
下一步是我遇到一些困难,这是仅考虑第一个值的部分。
我需要的结果应如下所示:
index name type
----------------------------
0 cow animal
1 apple fruit
2 carrot veg
3 dog pet
4 horse animal
5 car NaN
答案 0 :(得分:2)
我确信有更好的解决方案,但是您可以为每行使用apply
def myfunc(row):
if row['t1_check']:
return row['t1']
elif row['t2_check']:
return row['t2']
elif row['t3_check']:
return row['t3']
return np.nan
df['type']=df.apply(myfunc,axis=1)
df[['name','type']]
输出
index name type
----------------------------
0 cow animal
1 apple fruit
2 carrot veg
3 dog pet
4 horse animal
5 car NaN