用熊猫将一列扩展为多列?

时间:2020-04-24 20:22:06

标签: python pandas dataframe

我正在尝试对我进行的一项调查中的数据进行排序,例如在我的调查中,我问他们是否被诊断出患有以下任何一种疾病(ADHD,焦虑症,抑郁症等)。现在,在我的一列中,我为那些点击了多种疾病的人提供了多个值。我想为每种疾病创建一列,并根据用户是否选择它来使它具有True或False值。

{'Mental_health':
    ['ADHD, Anxiety, Depression',
     'ADHD, Depression, PTSD',
     'Anxiety, Borderline Personality Disorder, Depression',
     'OCD',
     'Anxiety',
     'Anxiety',
     'ADHD, Anxiety, Bipolar, Borderline Personality Disorder, Depression, PTSD, Schizophrenia',
     'ADHD, Anxiety, Autism, Depression, PTSD',
     'Anxiety, Depression',
     'Depression',
     'Depression',
     'None of the above',
     'Autism, Depression, PTSD',
     'None of the above',
     'ADHD, PTSD']
}

2 个答案:

答案 0 :(得分:1)

# sample data
s = """Mental_Health
ADHD, Anxiety, Depression
ADHD, Depression, PTSD
Anxiety, Borderline Personality Disorder, Depression
OCD
Anxiety
Anxiety
ADHD, Anxiety, Bipolar, Borderline Personality Disorder, Depression, PTSD, Schizophrenia
ADHD, Anxiety, Autism, Depression, PTSD
Anxiety, Depression
Depression
Depression
None of the above
Autism, Depression, PTSD
None of the above
ADHD, PTSD"""
df = pd.read_csv(StringIO(s), sep='|')
# str.split then expand list into columns and stack
new = df['Mental_Health'].str.split(', ', expand=True).stack()
# get_dummies and sum
final_df = pd.get_dummies(new).sum(level=0).astype(bool)

     ADHD  Anxiety  Autism  Bipolar  Borderline Personality Disorder  \
0    True     True   False    False                            False   
1    True    False   False    False                            False   
2   False     True   False    False                             True   
3   False    False   False    False                            False   
4   False     True   False    False                            False   
5   False     True   False    False                            False   
6    True     True   False     True                             True   
7    True     True    True    False                            False   
8   False     True   False    False                            False   
9   False    False   False    False                            False   
10  False    False   False    False                            False   
11  False    False   False    False                            False   
12  False    False    True    False                            False   
13  False    False   False    False                            False   
14   True    False   False    False                            False   

    Depression  None of the above    OCD   PTSD  Schizophrenia  
0         True              False  False  False          False  
1         True              False  False   True          False  
2         True              False  False  False          False  
3        False              False   True  False          False  
4        False              False  False  False          False  
5        False              False  False  False          False  
6         True              False  False   True           True  
7         True              False  False   True          False  
8         True              False  False  False          False  
9         True              False  False  False          False  
10        True              False  False  False          False  
11       False               True  False  False          False  
12        True              False  False   True          False  
13       False               True  False  False          False  
14       False              False  False   True          False  

答案 1 :(得分:0)

使用str然后用expand分割:

results = df.Mental_health.str.split(', ', expand=True)

您可以将这些结果附加到原始df

df_f = pd.concat([df, results], axis=1)