问候美丽的人!
我正在为一些客户呐喊编辑调查数据整理可视化。不幸的是,整个数据建模或端到端流程都不存在
我有多列如下:
What Role : Teacher, What Role: Engineer, What Role : Doctor
1 Yes, Yes, No,
2 No, No, Yes,
3, Yes, No, Yes,
所以,我想要做的是创建一个新列并转换是'到一个匹配Header的新值,所以如果医生是Yes,那么它将进入一个新的列:
What Role?
1 Teacher, Engineer,
2 Doctor,
3 Teacher, Doctor
可以通过创建字典然后创建for循环来完成吗?
例如:
import pandas as pd
df = pd.read_csv("file.csv")
Dictionary_File = {'What Role?' : 'What Role : Teacher',
'What Role?': 'What Role : Engineer', 'What Role?' : 'What Role : Doctor'}
for k,v in Dictionary_File.items():
(df[k] = df[k] == 'Yes', 'Unsure here' + df[v])
df = df.drop(list(Dictonary_File.values()), axis=1)
因此,当涉及到for循环时,我无法思考或找到将值合并为新内容的方法(除了手动将所有列Yes转换为新值然后合并......?)
任何帮助都会非常感激!
干杯,
答案 0 :(得分:1)
您需要先按What Role:
删除split
。
然后通过布尔掩码df == 'Yes'
按numpy.where
c = df.columns.str.split().str[-1]
s = np.where(df == 'Yes', ['{}, '.format(x) for x in c], '')
print (s)
[['Teacher, ' 'Engineer, ' '']
['' '' 'Doctor, ']
['Teacher, ' '' 'Doctor, ']]
df['new'] = pd.Series([''.join(x).strip(', ') for x in s], index=df.index)
print (df)
What Role : Teacher What Role : Engineer What Role : Doctor \
1 Yes Yes No
2 No No Yes
3 Yes No Yes
new
1 Teacher, Engineer
2 Doctor
3 Teacher, Doctor
答案 1 :(得分:1)
使用
选项1
In [1188]: cols = df.columns.str.split(': ').str[1]
In [1207]: df.eq('Yes').dot(cols + ', ').str[:-2]
Out[1207]:
0 Teacher, Engineer
1 Doctor
2 Teacher, Doctor
dtype: object
选项2
In [1189]: df.eq('Yes').apply(lambda x: ', '.join(cols[x]), 1)
Out[1189]:
0 Teacher, Engineer
1 Doctor
2 Teacher, Doctor
dtype: object
详细
In [1190]: df
Out[1190]:
What Role : Teacher What Role: Engineer What Role : Doctor
0 Yes Yes No
1 No No Yes
2 Yes No Yes