pandas按日期分组,为列

时间:2017-03-10 09:38:41

标签: python pandas dataframe pivot-table jupyter-notebook

我有一个DataFrame,其中columns = ['date','id','value'],其中id代表不同的产品。假设我们有n个产品。我希望创建一个新的数据框,其中包含columns = ['date','valueid1'..,'valueidn'],其中值被分配给相应的日期行(如果存在),NaN被指定为值,如果它们别。非常感谢

2 个答案:

答案 0 :(得分:3)

假设您有以下DF:

In [120]: df
Out[120]:
        date  id  value
0 2001-01-01   1     10
1 2001-01-01   2     11
2 2001-01-01   3     12
3 2001-01-02   3     20
4 2001-01-03   1     20
5 2001-01-04   2     30

您可以使用pivot_table()方法:

In [121]: df.pivot_table(index='date', columns='id', values='value')
Out[121]:
id             1     2     3
date
2001-01-01  10.0  11.0  12.0
2001-01-02   NaN   NaN  20.0
2001-01-03  20.0   NaN   NaN
2001-01-04   NaN  30.0   NaN

In [122]: df.pivot_table(index='date', columns='id', values='value', fill_value=0)
Out[122]:
id           1   2   3
date
2001-01-01  10  11  12
2001-01-02   0   0  20
2001-01-03  20   0   0
2001-01-04   0  30   0

答案 1 :(得分:1)

我认为你需要pivot

df = df.pivot(index='date', columns='id', values='value')

样品:

df = pd.DataFrame({'date':pd.date_range('2017-01-01', periods=5),
                   'id':[4,5,6,4,5],
                   'value':[7,8,9,1,2]})

print (df)
        date  id  value
0 2017-01-01   4      7
1 2017-01-02   5      8
2 2017-01-03   6      9
3 2017-01-04   4      1
4 2017-01-05   5      2

df = df.pivot(index='date', columns='id', values='value')
#alternative solution
#df = df.set_index(['date','id'])['value'].unstack()
print (df)
id            4    5    6
date                     
2017-01-01  7.0  NaN  NaN
2017-01-02  NaN  8.0  NaN
2017-01-03  NaN  NaN  9.0
2017-01-04  1.0  NaN  NaN
2017-01-05  NaN  2.0  NaN

但如果得到:

  

ValueError:索引包含重复的条目,无法重塑

必须使用汇总功能,例如meansum,...与groupbypivot_table

df = pd.DataFrame({'date':['2017-01-01', '2017-01-02',
                          '2017-01-03','2017-01-05','2017-01-05'],
                   'id':[4,5,6,4,4],
                   'value':[7,8,9,1,2]})

df.date = pd.to_datetime(df.date)
print (df)
        date  id  value
0 2017-01-01   4      7
1 2017-01-02   5      8
2 2017-01-03   6      9
3 2017-01-05   4      1 <- duplicity 2017-01-05   4
4 2017-01-05   4      2 <- duplicity 2017-01-05   4

df = df.groupby(['date', 'id'])['value'].mean().unstack()
#alternative solution (another answer same as groupby only slowier in big df)
#df = df.pivot_table(index='date', columns='id', values='value', aggfunc='mean')

print (df)
id            4    5    6
date                     
2017-01-01  7.0  NaN  NaN
2017-01-02  NaN  8.0  NaN
2017-01-03  NaN  NaN  9.0
2017-01-05  1.5  NaN  NaN <- 1.5 is mean (1 + 2)/2