我正在尝试对我进行的一项调查中的数据进行排序,例如在我的调查中,我问他们是否被诊断出患有以下任何一种疾病(ADHD,焦虑症,抑郁症等)。现在,在我的一列中,我为那些点击了多种疾病的人提供了多个值。我想为每种疾病创建一列,并根据用户是否选择它来使它具有True或False值。
{'Mental_health':
['ADHD, Anxiety, Depression',
'ADHD, Depression, PTSD',
'Anxiety, Borderline Personality Disorder, Depression',
'OCD',
'Anxiety',
'Anxiety',
'ADHD, Anxiety, Bipolar, Borderline Personality Disorder, Depression, PTSD, Schizophrenia',
'ADHD, Anxiety, Autism, Depression, PTSD',
'Anxiety, Depression',
'Depression',
'Depression',
'None of the above',
'Autism, Depression, PTSD',
'None of the above',
'ADHD, PTSD']
}
答案 0 :(得分:1)
# sample data
s = """Mental_Health
ADHD, Anxiety, Depression
ADHD, Depression, PTSD
Anxiety, Borderline Personality Disorder, Depression
OCD
Anxiety
Anxiety
ADHD, Anxiety, Bipolar, Borderline Personality Disorder, Depression, PTSD, Schizophrenia
ADHD, Anxiety, Autism, Depression, PTSD
Anxiety, Depression
Depression
Depression
None of the above
Autism, Depression, PTSD
None of the above
ADHD, PTSD"""
df = pd.read_csv(StringIO(s), sep='|')
# str.split then expand list into columns and stack
new = df['Mental_Health'].str.split(', ', expand=True).stack()
# get_dummies and sum
final_df = pd.get_dummies(new).sum(level=0).astype(bool)
ADHD Anxiety Autism Bipolar Borderline Personality Disorder \
0 True True False False False
1 True False False False False
2 False True False False True
3 False False False False False
4 False True False False False
5 False True False False False
6 True True False True True
7 True True True False False
8 False True False False False
9 False False False False False
10 False False False False False
11 False False False False False
12 False False True False False
13 False False False False False
14 True False False False False
Depression None of the above OCD PTSD Schizophrenia
0 True False False False False
1 True False False True False
2 True False False False False
3 False False True False False
4 False False False False False
5 False False False False False
6 True False False True True
7 True False False True False
8 True False False False False
9 True False False False False
10 True False False False False
11 False True False False False
12 True False False True False
13 False True False False False
14 False False False True False
答案 1 :(得分:0)
使用str
然后用expand
分割:
results = df.Mental_health.str.split(', ', expand=True)
您可以将这些结果附加到原始df
df_f = pd.concat([df, results], axis=1)