Python融化数据框

时间:2020-07-15 13:21:14

标签: python pandas

我正在尝试转换数据框。

目前我有类似的东西

    Material    Revenue 2007    Revenue 2008    Revenue 2009    Profit 2007  Profit 2008    Profit 2009
    Mat A       50              55              60              10           15             20
    Mat B       45              50              55               5           10             35
    Mat C       75              80              85              35           30             45

这是我正在尝试实现的转换:

Material    Revenue     Profit    Period
Mat A       50          10        2007
Mat A       55           5        2008
Mat A       75          35        2009
Mat B       55          15        2007
Mat B       50          10        2008
Mat B       80          30        2009
Mat C       60          20        2007
Mat C       55          35        2008
Mat C       85          45        2009

从我收集的资料来看,我最有可能不得不使用melt,但我无法使代码正常工作。

编辑:

此代码似乎确实有效,但是太复杂了,无法用于实际的数据帧。

df1 = df.melt(id_vars=['Material'], 
              value_vars=['Revenue 2007', 'Revenue 2008', 'Revenue 2009'],
              var_name='Period', value_name='Revenue')
df1["Period"]=df1['Period'].str[-4:]

df2 = df.melt(id_vars=['Material'], 
              value_vars=['Profit 2007', 'Profit 2008', 'Profit 2009'],
              var_name='Period', value_name='Profit')
df1["Profit"]=df2["Profit"]

3 个答案:

答案 0 :(得分:0)

将所有列变形为melt()目标。将创建的列除以空格。将它们作为一组输出。

df1 = df.melt(id_vars=['Material'], 
              value_vars=['Revenue 2007', 'Revenue 2008', 'Revenue 2009','Profit 2007','Profit 2008','Profit 2009'],
              var_name='Period', value_name='Revenue')
df2 = pd.concat([df1, df1['Period'].str.split(' ', expand=True)], axis=1).drop('Period', axis=1)
df2.rename(columns={0:'flg', 1:'Period'},inplace=True)
df2.groupby(['Material','Period','flg'])['Revenue'].sum().unstack().reset_index()

flg Material    Period  Profit  Revenue
0   Mat A   2007    10  50
1   Mat A   2008    15  55
2   Mat A   2009    20  60
3   Mat B   2007    5   45
4   Mat B   2008    10  50
5   Mat B   2009    35  55
6   Mat C   2007    35  75
7   Mat C   2008    30  80
8   Mat C   2009    45  85

答案 1 :(得分:0)

这是您要找的吗?

left = df[[col for col in df.columns if col.startswith('Profit')] + ['Material']]\
    .melt(id_vars='Material', var_name='Period', value_name='Profit')
left['Period'] = left['Period'].str.split(' ').str[1]

right = df[[col for col in df.columns if col.startswith('Revenue')] + ['Material']]\
    .melt(id_vars='Material', var_name='Period', value_name='Revenue')
right['Period'] = right['Period'].str.split(' ').str[1]

print(left.merge(right).sort_values(by=['Material', 'Period']).reset_index(drop=True))

输出

  Material Period  Profit  Revenue
0    Mat A   2007      10       50
1    Mat A   2008      15       55
2    Mat A   2009      20       60
3    Mat B   2007       5       45
4    Mat B   2008      10       50
5    Mat B   2009      35       55
6    Mat C   2007      35       75
7    Mat C   2008      30       80
8    Mat C   2009      45       85

答案 2 :(得分:0)

df = pd.melt(df, id_vars=['Material'])
df['Period'] = df.variable.str.split(" ").str[1]
df['type'] = df.variable.str.split(" ").str[0]
df = df.drop('variable', axis=1)
df = (
  df
  .groupby(['Material','Period','type'])
  .sum()
  .unstack('type')
  .reset_index()
)
df.columns = ["Material", "Period", "Profit", "Revenue"]
df['Material'] = 'Mat ' + df['Material'].astype(str)
df = df[["Material","Revenue","Profit","Period"]]
df

    Material    Revenue Profit  Period
0   Mat A   50  10  2007
1   Mat A   55  15  2008
2   Mat A   60  20  2009
3   Mat B   45  5   2007
4   Mat B   50  10  2008
5   Mat B   55  35  2009
6   Mat C   75  35  2007
7   Mat C   80  30  2008
8   Mat C   85  45  2009