Question

我正在使用pandas来重塑一些字符串/数值的响应，并且我遇到了一些有点违反直觉的行为。

有人可以解释下面的数据框stacked和pivoted之间的区别，以及为什么pivoted2会提升DataError，即使没有传递aggfunc？< / p>

import pandas as pd

d = {'ID': pd.Series(['x']*3 + ['y']*3,index = range(6)),
     'Count': pd.Series([1,2,1,1,1,1], index = range(6)),
     'Value_type': pd.Series(['foo','foo','bar','foo','bar','baz'], index = range(6)),
     'Value': pd.Series(range(1,7),index = range(6))}
df = pd.DataFrame(d)

d2 = {'ID': pd.Series(['x']*3 + ['y']*3,index = range(6)),
     'Count': pd.Series([1,2,1,1,1,1], index = range(6)),
     'Value_type': pd.Series(['foo','foo','bar','foo','bar','baz'], index = range(6)),
     'Value': pd.Series(list('abcdef'),index = range(6))}
df2 = pd.DataFrame(d2)

restacked = df.set_index(['ID','Count','Value_type']).unstack()
print restacked

restacked2 =  df2.set_index(['ID','Count','Value_type']).unstack()
print restacked2

pivoted = pd.pivot_table(df,rows = ['ID','Count'],cols = 'Value_type',values = 'Value')
print pivoted

## raises DataError('No numeric types to aggregate'), 
## even though no aggregation function is passed.
pivoted2 = pd.pivot_table(df2,rows = ['ID','Count'],cols = 'Value_type',values = 'Value')
print pivoted2

Answer 1

default agg function是np.mean（即使你没有明确地传递它，这就是正在使用的东西），这对字符串没有意义，实际上它在传递时会引发一个AttributeError一个对象数组 - 所以当你尝试这样做时，pandas会抱怨。

您可以通过np.sum：

In [11]: pd.pivot_table(df2, rows=['ID', 'Count'], cols='Value_type',
                        values='Value', aggfunc=np.sum)
Out[11]: 
Value_type  bar  baz foo
ID Count                
x  1          c  NaN   a
   2        NaN  NaN   b
y  1          e    f   d

或者使用iloc[0]获取第一项：

In [12]: pd.pivot_table(df2, rows=['ID', 'Count'], cols='Value_type',
                        values='Value', aggfunc=lambda x: x.iloc[0])
Out[12]: 
Value_type  bar  baz foo
ID Count                
x  1          c  NaN   a
   2        NaN  NaN   b
y  1          e    f   d

注意：这与pivoted2['Value']相同，如果您将列表传递给汇总的值，则可以使此输出与pivoted2相同：

In [13]: pd.pivot_table(df2, rows=['ID', 'Count'], cols=['Value_type'], 
                        values=['Value'], aggfunc=lambda x: x.iloc[0])
Out[13]: 
           Value         
Value_type   bar  baz foo
ID Count                 
x  1           c  NaN   a
   2         NaN  NaN   b
y  1           e    f   d

Pandas Dataframe Stacking与Pivoting

1 个答案: