我正在尝试转换数据框。
目前我有类似的东西
Material Revenue 2007 Revenue 2008 Revenue 2009 Profit 2007 Profit 2008 Profit 2009
Mat A 50 55 60 10 15 20
Mat B 45 50 55 5 10 35
Mat C 75 80 85 35 30 45
这是我正在尝试实现的转换:
Material Revenue Profit Period
Mat A 50 10 2007
Mat A 55 5 2008
Mat A 75 35 2009
Mat B 55 15 2007
Mat B 50 10 2008
Mat B 80 30 2009
Mat C 60 20 2007
Mat C 55 35 2008
Mat C 85 45 2009
从我收集的资料来看,我最有可能不得不使用melt,但我无法使代码正常工作。
编辑:
此代码似乎确实有效,但是太复杂了,无法用于实际的数据帧。
df1 = df.melt(id_vars=['Material'],
value_vars=['Revenue 2007', 'Revenue 2008', 'Revenue 2009'],
var_name='Period', value_name='Revenue')
df1["Period"]=df1['Period'].str[-4:]
df2 = df.melt(id_vars=['Material'],
value_vars=['Profit 2007', 'Profit 2008', 'Profit 2009'],
var_name='Period', value_name='Profit')
df1["Profit"]=df2["Profit"]
答案 0 :(得分:0)
将所有列变形为melt()
目标。将创建的列除以空格。将它们作为一组输出。
df1 = df.melt(id_vars=['Material'],
value_vars=['Revenue 2007', 'Revenue 2008', 'Revenue 2009','Profit 2007','Profit 2008','Profit 2009'],
var_name='Period', value_name='Revenue')
df2 = pd.concat([df1, df1['Period'].str.split(' ', expand=True)], axis=1).drop('Period', axis=1)
df2.rename(columns={0:'flg', 1:'Period'},inplace=True)
df2.groupby(['Material','Period','flg'])['Revenue'].sum().unstack().reset_index()
flg Material Period Profit Revenue
0 Mat A 2007 10 50
1 Mat A 2008 15 55
2 Mat A 2009 20 60
3 Mat B 2007 5 45
4 Mat B 2008 10 50
5 Mat B 2009 35 55
6 Mat C 2007 35 75
7 Mat C 2008 30 80
8 Mat C 2009 45 85
答案 1 :(得分:0)
这是您要找的吗?
left = df[[col for col in df.columns if col.startswith('Profit')] + ['Material']]\
.melt(id_vars='Material', var_name='Period', value_name='Profit')
left['Period'] = left['Period'].str.split(' ').str[1]
right = df[[col for col in df.columns if col.startswith('Revenue')] + ['Material']]\
.melt(id_vars='Material', var_name='Period', value_name='Revenue')
right['Period'] = right['Period'].str.split(' ').str[1]
print(left.merge(right).sort_values(by=['Material', 'Period']).reset_index(drop=True))
输出
Material Period Profit Revenue
0 Mat A 2007 10 50
1 Mat A 2008 15 55
2 Mat A 2009 20 60
3 Mat B 2007 5 45
4 Mat B 2008 10 50
5 Mat B 2009 35 55
6 Mat C 2007 35 75
7 Mat C 2008 30 80
8 Mat C 2009 45 85
答案 2 :(得分:0)
df = pd.melt(df, id_vars=['Material'])
df['Period'] = df.variable.str.split(" ").str[1]
df['type'] = df.variable.str.split(" ").str[0]
df = df.drop('variable', axis=1)
df = (
df
.groupby(['Material','Period','type'])
.sum()
.unstack('type')
.reset_index()
)
df.columns = ["Material", "Period", "Profit", "Revenue"]
df['Material'] = 'Mat ' + df['Material'].astype(str)
df = df[["Material","Revenue","Profit","Period"]]
df
Material Revenue Profit Period
0 Mat A 50 10 2007
1 Mat A 55 15 2008
2 Mat A 60 20 2009
3 Mat B 45 5 2007
4 Mat B 50 10 2008
5 Mat B 55 35 2009
6 Mat C 75 35 2007
7 Mat C 80 30 2008
8 Mat C 85 45 2009