我是Pandas的新手,正在测试和学习。从Excel导入的数据框出现以下问题: - 数据框包含以下变量:
<class 'pandas.core.frame.DataFrame'>
Int64Index: 48062 entries, 0 to 48061
Data columns (total 11 columns):
Konskund_MEAB 48062 non-null values
Strukturordn 48062 non-null values
Antal_forsandelser 48062 non-null values
ProdID 48062 non-null values
Sort 48062 non-null values
Storstad 48062 non-null values
Year 48062 non-null values
snittvikt 48062 non-null values
Totsum 48062 non-null values
Prodsum 48062 non-null values
snittpris 48062 non-null values
dtypes: float64(9), object(2)
运行:
np.average(df['snittpris'],weights=df['Antal_forsandelser'])
产生正确的结果
当我尝试使用以下命令运行pivot_table时:
df_sum=pd.pivot_table(df,rows=['Konskund_MEAB','ProdID'],cols=['Year'],
aggfunc=np.average(df ['snittpris'],weights=df['Antal_forsandelser']))
我收到以下错误消息。
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-90-9fd03896c806> in <module>()
----> 1 df_sum=pd.pivot_table(df,rows=['Konskund_MEAB','ProdID'],cols=['Year'],
aggfunc=np.average(df['snittpris'],weights=df['Antal_forsandelser']))
C:\Users\Bengtw\AppData\Local\Enthought\Canopy32\User\lib\site-packages\pandas\tools\pivot.pyc
in pivot_table(data, values, rows, cols, aggfunc, fill_value, margins, dropna)
101
102 grouped = data.groupby(keys)
--> 103 agged = grouped.agg(aggfunc)
104
105 table = agged
C:\Users\Bengtw\AppData\Local\Enthought\Canopy32\User\lib\site-packages\pandas\core\groupby.pyc
in agg(self, func, *args, **kwargs)
342 @Appender(_agg_doc)
343 def agg(self, func, *args, **kwargs):
--> 344 return self.aggregate(func, *args, **kwargs)
345
346 def _iterate_slices(self):
C:\Users\Bengtw\AppData\Local\Enthought\Canopy32\User\lib\site-packages\pandas\core\groupby.pyc
in aggregate(self, arg, *args, **kwargs)
1741
1742 if self.grouper.nkeys > 1:
-> 1743 return self._python_agg_general(arg, *args, **kwargs)
1744 else:
1745 result = self._aggregate_generic(arg, *args, **kwargs)
C:\Users\Bengtw\AppData\Local\Enthought\Canopy32\User\lib\site-packages\pandas\core\groupby.pyc
in _python_agg_general(self, func, *args, **kwargs)
480
481 if len(output) == 0:
--> 482 return self._python_apply_general(f)
483
484 if self.grouper._filter_empty_groups:
C:\Users\Bengtw\AppData\Local\Enthought\Canopy32\User\lib\site-packages\pandas\core\groupby.pyc
in _python_apply_general(self, f)
332
333 def _python_apply_general(self, f):
--> 334 keys, values, mutated = self.grouper.apply(f, self.obj, self.axis)
335
336 return self._wrap_applied_output(keys, values,
C:\Users\Bengtw\AppData\Local\Enthought\Canopy32\User\lib\site-packages\pandas\core\groupby.pyc
in apply(self, f, data, axis, keep_internal)
628 # group might be modified
629 group_axes = _get_axes(group)
--> 630 res = f(group)
631 if not _is_indexed_like(res, group_axes):
632 mutated = True
C:\Users\Bengtw\AppData\Local\Enthought\Canopy32\User\lib\site-packages\pandas\core\groupby.pyc
in <lambda>(x)
468 def _python_agg_general(self, func, *args, **kwargs):
469 func = _intercept_function(func)
--> 470 f = lambda x: func(x, *args, **kwargs)
471
472 # iterate through "columns" ex exclusions to populate output dict
TypeError: 'numpy.float64' object is not callable
问题是什么?行变量Konskund_MEAB包含字符串(几百种不同),ProdID是数字并且具有4个唯一值。年就是它(4个离散值)。
答案 0 :(得分:1)
参数aggfunc
应该是函数,但是你传递的是浮点数。
因此TypeError:
TypeError: 'numpy.float64' object is not callable
你可以传入一个匿名(lambda)函数,可能就是你所追求的:
aggfunc=lambda x: np.average(x['snittpris'], weights=x['Antal_forsandelser'])
不幸的是,在这种情况下这不起作用(因为aggfunc
无法访问未使用的列)...
相反,您可以使用groupby:
rows = ['Konskund_MEAB','ProdID']
cols = ['Year']
g = df.groupby(rows + columns)
并将该函数应用于每个组,然后将unstack
从系列应用到DataFrame:
s_av = g.apply(lambda x: np.average(x['snittpris'], weights=x['Antal_forsandelser']))
df_av = s_av.unstack(cols)