遍历数据帧并根据条件替换值

时间:2019-02-14 06:36:35

标签: python pandas loops for-loop if-statement

我是python的新手(来自R),我不知道如何在python中遍历数据帧。我在下面提供了一个数据框以及可能的“干预措施”列表。我要尝试的是在数据框中的“干预”列中进行搜索,如果干预在“ intervention_list”中,则将值替换为“是干预”,但如果将“ NaN”替换为“无干预”。

任何指导或帮助将不胜感激。

import pandas as pd
intervention_list = ['Intervention 1', 'Intervention 2']
df = pd.DataFrame({'ID':[100,200,300,400,500,600,700],
                  'Intervention':['Intervention 1', 'NaN','NaN','NaN','Intervention 2','Intervention 1','NaN']})
print(df)

我希望完成的数据帧如下所示:

df_new = pd.DataFrame({'ID':[100,200,300,400,500,600,700],
                  'Intervention':['Yes Intervention', 'No Intervention','No Intervention','No Intervention','Yes Intervention','Yes Intervention','No Intervention']})
print(df_new)

谢谢!

1 个答案:

答案 0 :(得分:1)

在熊猫中最好避免循环,因为它很慢,因此请使用numpy.whereSeries.isna来测试缺失值,或者 Series.notna用于矢量化解决方案:

df['Intervention'] = np.where(df['Intervention'].isna(),'No Intervention','Yes Intervention')

或者:

df['Intervention'] = np.where(df['Intervention'].notna(),'Yes Intervention','No Intervention')

如果NaN是字符串,则通过==Series.eq进行测试:

df['Intervention']=np.where(df['Intervention'].eq('NaN'),'No Intervention','Yes Intervention')

但是如果还需要在列表中进行测试,请使用numpy.select

m1 = df['Intervention'].isin(intervention_list)
m2 = df['Intervention'].isna()

#if not match m1 or m2 create default None
df['Intervention'] = np.select([m1, m2],
                              ['Yes Intervention','No Intervention'],
                              default=None)

#if not match m1 or m2 set original value column Intervention
df['Intervention'] = np.select([m1, m2],
                              ['Yes Intervention','No Intervention'],
                              default=df['Intervention'])