我正在尝试根据诊断描述字段中一个或多个单词的存在来对数据帧进行子集化。
例如,运行以下代码:
import pandas as pd
df = pd.DataFrame()
member_id = ['A0001','A0001','A0001','B0002','B0002','C0003','C0003','C0003','C0003']
icd_desc = ['Knee pain','Obesity','Osteoarthritis right knee','Lung cancer','Bipolar disorder','','Cardiovascular','Epidermal','Severe trauma']
df['member_id'] = member_id
df['icd_desc'] = icd_desc
df_kneeobesity = df[('Obesity' in df.icd_desc.split()) | ('Knee' in df.icd_desc.split()) | ('knee' in df.icd_desc.split())]
df_kneeobesity
我得到了错误:
AttributeError: 'Series' object has no attribute 'split'
我已经确定问题似乎是icd_desc
列是pandas.core.series.Series数据类型而不是字符串,但是我无法将icd_desc
转换为串。我的第一个想法是运行df['icd_desc']= df['icd_desc'].astype(str)
,但是没有用。
我在做什么错了?