如果在Pandas DataFrame列中包含子字符串,如何替换列表中的字符串

时间:2020-06-28 22:58:19

标签: python pandas

我有一个df:

df = pd.DataFrame({'age': [13,62,53, 33],
                   'gender': ['male','female','male', 'male'],
                   'symptoms': [['acute respiratory distress', 'fever'],
                                ['acute respiratory disease', 'cough'],
                                ['fever'],
                                ['respiratory distress']]})


df

输出:

       age    gender    symptoms
0       31      male    [acute respiratory distress, fever]
1       62      female  [acute respiratory disease, cough]
2       23      male    [fever]
3       33      male    [respiratory distress]

我正在尝试替换“症状”列(在本例中为列表)中包含子字符串“呼吸”的所有值实例,并将该列表中的整个值更改为“急性呼吸窘迫”,因此在整个数据帧中是一致的。这是理想的结果:

Output:

       age    gender    symptoms
0       31      male    [acute respiratory distress, fever]
1       62      female  [acute respiratory distress, cough]
2       23      male    [fever]
3       33      male    [acute respiratory distress]

我尝试过:

df.loc[df['symptoms'].str.contains('respiratory', na=False), 'symptoms'] = 'acute respiratory 
distress'

print(df)

数据框保持不变。

2 个答案:

答案 0 :(得分:2)

赞:

import pandas as pd

df = pd.DataFrame({'age': [13,62,53, 33],
                   'gender': ['male','female','male', 'male'],
                   'symptoms': [['acute respiratory distress', 'fever'],
                                ['acute respiratory disease', 'cough'],
                                ['fever'],
                                ['respiratory distress']]})

df['symptoms'] = [['acute respiratory disease' if 'respiratory' in s else s for s in lst] for lst in df['symptoms']]
       
print(df)

输出:

   age  gender                            symptoms
0   13    male  [acute respiratory disease, fever]
1   62  female  [acute respiratory disease, cough]
2   53    male                             [fever]
3   33    male         [acute respiratory disease]

答案 1 :(得分:0)

加入explode,然后使用contains分配

>>> s = df.symptoms.explode()
>>> df['symptoms'] = s.mask(s.str.contains('respiratory'),'acute respiratory distress').groupby(level=0).agg(list)
>>> df
   age  gender                             symptoms
0   13    male  [acute respiratory distress, fever]
1   62  female  [acute respiratory distress, cough]
2   53    male                              [fever]
3   33    male         [acute respiratory distress]