Question

我有一个df：

df = pd.DataFrame({'age': [13,62,53, 33],
                   'gender': ['male','female','male', 'male'],
                   'symptoms': [['acute respiratory distress', 'fever'],
                                ['acute respiratory disease', 'cough'],
                                ['fever'],
                                ['respiratory distress']]})


df

输出：

       age    gender    symptoms
0       31      male    [acute respiratory distress, fever]
1       62      female  [acute respiratory disease, cough]
2       23      male    [fever]
3       33      male    [respiratory distress]

我正在尝试替换“症状”列（在本例中为列表）中包含子字符串“呼吸”的所有值实例，并将该列表中的整个值更改为“急性呼吸窘迫”，因此在整个数据帧中是一致的。这是理想的结果：

Output:

       age    gender    symptoms
0       31      male    [acute respiratory distress, fever]
1       62      female  [acute respiratory distress, cough]
2       23      male    [fever]
3       33      male    [acute respiratory distress]

我尝试过：

df.loc[df['symptoms'].str.contains('respiratory', na=False), 'symptoms'] = 'acute respiratory 
distress'

print(df)

数据框保持不变。

Answer 1

赞：

import pandas as pd

df = pd.DataFrame({'age': [13,62,53, 33],
                   'gender': ['male','female','male', 'male'],
                   'symptoms': [['acute respiratory distress', 'fever'],
                                ['acute respiratory disease', 'cough'],
                                ['fever'],
                                ['respiratory distress']]})

df['symptoms'] = [['acute respiratory disease' if 'respiratory' in s else s for s in lst] for lst in df['symptoms']]
       
print(df)

输出：

   age  gender                            symptoms
0   13    male  [acute respiratory disease, fever]
1   62  female  [acute respiratory disease, cough]
2   53    male                             [fever]
3   33    male         [acute respiratory disease]

Answer 2

加入explode，然后使用contains分配

>>> s = df.symptoms.explode()
>>> df['symptoms'] = s.mask(s.str.contains('respiratory'),'acute respiratory distress').groupby(level=0).agg(list)
>>> df
   age  gender                             symptoms
0   13    male  [acute respiratory distress, fever]
1   62  female  [acute respiratory distress, cough]
2   53    male                              [fever]
3   33    male         [acute respiratory distress]

如果在Pandas DataFrame列中包含子字符串，如何替换列表中的字符串

2 个答案: