我想将列转换为子列。
假设数据就像;
Q1 Q2:Q21 Q2:Q22 Q2:Q23 Q3:Q31 Q3:Q32
0 yes green blue green bus car
1 no red orange blue car bike
2 yes green yellow black car walk
3 yes yellow green brown bus walk
4 no green green red car bus
重塑列之后,我想要拥有
Q1 Q2 Q3
Q1 Q21 Q22 Q23 Q31 Q32
0 yes green blue green bus car
1 no red orange blue car bike
2 yes green yellow black car walk
3 yes yellow green brown bus walk
4 no green green red car bus
在这里,我尝试了什么;
import pandas as pd
survey = pd.read_csv('survey.csv')
# first column names
survey_cols = [col.split(':')[0] for col in survey.columns]
# unique column names
survey_ucols = []
for e in survey_cols:
if e not in survey_ucols:
survey_ucols.append(e)
# second column names, subcolumns
survey_subcols = []
for col in survey_ucols:
survey_subcols.append([subcol.split(':')[-1] for subcol in survey.columns if col in subcol])
# create new df
tuples = list(zip(survey_ucols,survey_subcols))
cols = pd.MultiIndex.from_tuples(tuples, names=['mainQ', 'subQ'])
survey_new = pd.DataFrame(survey, columns=cols)
预先感谢
答案 0 :(得分:2)
您可以使用Index.to_series
和Series.str.split
创建帮助器DataFrame,因此可以通过ffill
向前填充每行的缺失值,最后分配回MultiIndex.from_arrays
:
df = survey.columns.to_series().str.split(':', expand=True).ffill(axis=1)
survey.columns = pd.MultiIndex.from_arrays([df[0].tolist(), df[1].tolist()])
#simplified
#survey.columns = [df[0].tolist(), df[1].tolist()]
print (survey)
Q1 Q2 Q3
Q1 Q21 Q22 Q23 Q31 Q32
0 yes green blue green bus car
1 no red orange blue car bike
2 yes green yellow black car walk
3 yes yellow green brown bus walk
4 no green green red car bus
详细信息:
print (df)
0 1
Q1 Q1 Q1
Q2:Q21 Q2 Q21
Q2:Q22 Q2 Q22
Q2:Q23 Q2 Q23
Q3:Q31 Q3 Q31
Q3:Q32 Q3 Q32