我有一个df:
df = pd.DataFrame({'age': [13,62,53, 33],
'gender': ['male','female','male', 'male'],
'symptoms': [['acute respiratory distress', 'fever'],
['acute respiratory disease', 'cough'],
['fever'],
['respiratory distress']]})
df
输出:
age gender symptoms
0 31 male [acute respiratory distress, fever]
1 62 female [acute respiratory disease, cough]
2 23 male [fever]
3 33 male [respiratory distress]
我正在尝试替换“症状”列(在本例中为列表)中包含子字符串“呼吸”的所有值实例,并将该列表中的整个值更改为“急性呼吸窘迫”,因此在整个数据帧中是一致的。这是理想的结果:
Output:
age gender symptoms
0 31 male [acute respiratory distress, fever]
1 62 female [acute respiratory distress, cough]
2 23 male [fever]
3 33 male [acute respiratory distress]
我尝试过:
df.loc[df['symptoms'].str.contains('respiratory', na=False), 'symptoms'] = 'acute respiratory
distress'
print(df)
数据框保持不变。
答案 0 :(得分:2)
赞:
import pandas as pd
df = pd.DataFrame({'age': [13,62,53, 33],
'gender': ['male','female','male', 'male'],
'symptoms': [['acute respiratory distress', 'fever'],
['acute respiratory disease', 'cough'],
['fever'],
['respiratory distress']]})
df['symptoms'] = [['acute respiratory disease' if 'respiratory' in s else s for s in lst] for lst in df['symptoms']]
print(df)
输出:
age gender symptoms
0 13 male [acute respiratory disease, fever]
1 62 female [acute respiratory disease, cough]
2 53 male [fever]
3 33 male [acute respiratory disease]
答案 1 :(得分:0)
加入explode
,然后使用contains
分配
>>> s = df.symptoms.explode()
>>> df['symptoms'] = s.mask(s.str.contains('respiratory'),'acute respiratory distress').groupby(level=0).agg(list)
>>> df
age gender symptoms
0 13 male [acute respiratory distress, fever]
1 62 female [acute respiratory distress, cough]
2 53 male [fever]
3 33 male [acute respiratory distress]