大熊猫相当于R dcast

时间:2016-05-01 18:20:23

标签: python r pandas dataframe pivot-table

我有一些这样的数据:

import pandas as pd
df = pd.DataFrame(index = range(1,13), columns=['school', 'year', 'metric', 'values'], )
df['school'] = ['id1']*6 + ['id2']*6
df['year'] = (['2015']*3 + ['2016']*3)*2
df['metric'] = ['tuition', 'admitsize', 'avgfinaid'] * 4
df['values'] = range(1,13)
df
   school  year     metric  values
1     id1  2015    tuition       1
2     id1  2015  admitsize       2
3     id1  2015  avgfinaid       3
4     id1  2016    tuition       4
5     id1  2016  admitsize       5
6     id1  2016  avgfinaid       6
7     id2  2015    tuition       7
8     id2  2015  admitsize       8
9     id2  2015  avgfinaid       9
10    id2  2016    tuition      10
11    id2  2016  admitsize      11
12    id2  2016  avgfinaid      12

我想调整指标&将列格式化为宽格式。也就是说,我想:

school  year  tuition  admitsize  avgfinaid
   id1  2015        1          2          3
   id1  2016        4          5          6
   id2  2015        7          8          9
   id2  2016       10         11         12

如果这是R,我会做类似的事情:

df2 <- dcast(df, id + year ~ metric, value.var = "values")

我如何在熊猫中这样做?我在大熊猫文档中阅读了this (otherwise very helpful) SO answerthis (also otherwise excellent) example,但没有理解如何将其应用到我的需求中。我不需要像dcast一样的单行,只是如何在标准DataFrame(不是groupby,multi-index或其他花哨的对象)中获得结果的示例。

1 个答案:

答案 0 :(得分:12)

您可以使用pivot_table()

In [23]: df2 = (df.pivot_table(index=['school', 'year'], columns='metric',
   ....:                       values='values')
   ....:          .reset_index()
   ....:       )

In [24]:

In [24]: df2
Out[24]:
metric school  year  admitsize  avgfinaid  tuition
0         id1  2015          2          3        1
1         id1  2016          5          6        4
2         id2  2015          8          9        7
3         id2  2016         11         12       10