将多个值合并到新列中的一行Pandas Python

时间:2018-02-23 11:31:33

标签: python pandas csv data-visualization

问候美丽的人!

我正在为一些客户呐喊编辑调查数据整理可视化。不幸的是,整个数据建模或端到端流程都不存在

我有多列如下:

    What Role : Teacher, What Role: Engineer, What Role : Doctor
1   Yes,                 Yes,                 No, 
2   No,                  No,                  Yes,
3,  Yes,                 No,                  Yes, 

所以,我想要做的是创建一个新列并转换是'到一个匹配Header的新值,所以如果医生是Yes,那么它将进入一个新的列:

    What Role?
1   Teacher, Engineer,
2   Doctor,
3   Teacher, Doctor

可以通过创建字典然后创建for循环来完成吗?

例如:

import pandas as pd

df = pd.read_csv("file.csv")

Dictionary_File = {'What Role?' : 'What Role : Teacher', 
'What Role?': 'What Role : Engineer', 'What Role?' : 'What Role : Doctor'}

for k,v in Dictionary_File.items():
   (df[k] = df[k] == 'Yes', 'Unsure here' + df[v])

df = df.drop(list(Dictonary_File.values()), axis=1)

因此,当涉及到for循环时,我无法思考或找到将值合并为新内容的方法(除了手动将所有列Yes转换为新值然后合并......?)

任何帮助都会非常感激!

干杯,

2 个答案:

答案 0 :(得分:1)

您需要先按What Role:删除split

然后通过布尔掩码df == 'Yes'numpy.where

创建连接值
c = df.columns.str.split().str[-1]
s = np.where(df == 'Yes', ['{}, '.format(x) for x in c], '')
print (s)
[['Teacher, ' 'Engineer, ' '']
 ['' '' 'Doctor, ']
 ['Teacher, ' '' 'Doctor, ']]

df['new'] = pd.Series([''.join(x).strip(', ') for x in s], index=df.index)
print (df)
  What Role : Teacher What Role : Engineer What Role : Doctor  \
1                 Yes                  Yes                 No   
2                  No                   No                Yes   
3                 Yes                   No                Yes   

                 new  
1  Teacher, Engineer  
2             Doctor  
3    Teacher, Doctor  

答案 1 :(得分:1)

使用

选项1

In [1188]: cols = df.columns.str.split(': ').str[1]

In [1207]: df.eq('Yes').dot(cols + ', ').str[:-2]
Out[1207]:
0    Teacher, Engineer
1               Doctor
2      Teacher, Doctor
dtype: object

选项2

In [1189]: df.eq('Yes').apply(lambda x: ', '.join(cols[x]), 1)
Out[1189]:
0    Teacher, Engineer
1               Doctor
2      Teacher, Doctor
dtype: object

详细

In [1190]: df
Out[1190]:
  What Role : Teacher What Role: Engineer What Role : Doctor
0                 Yes                 Yes                 No
1                  No                  No                Yes
2                 Yes                  No                Yes