这是我的数据框:
Name Job
A Back-end Engineer
B Front-end Engineer;Product Manager
C Product Manager;Business Development;System Analyst
我想像这样将数据帧转换为虚拟(一种热编码):
Name Back-end Engineer Business Development Front-end Engineer Product Manager System Analyst
A 1 0 0 0 0
B 0 0 1 1 0
C 0 1 0 1 0
我尝试使用pandas.get_dummies,但是因为变量是多变量,所以失败了。
答案 0 :(得分:1)
您可以尝试以下操作:
import pandas as pd
from collections import defaultdict
df = pd.read_csv("path/to/your.csv")
jobs = df["Job"]
job_list = set()
for job in jobs:
job_names = job.split(";")
for job_name in job_names:
job_list.add(job_name)
new_df = defaultdict(list)
for index, row in df.iterrows():
new_df["Name"].append(row["Name"])
for job in job_list:
if job in row["Job"]:
new_df[job].append(1)
else:
new_df[job].append(0)
new_df = pd.DataFrame.from_dict(new_df)
new_df.to_csv("/path/to/new.csv")