我的帖子类似于另一个SO帖子:equivalent-of-r-function-ave-in-python-pandas,但我收到了错误消息。
假设:
我有一个数据框df
:
A B C D
0 foo one -2.0 0.5
1 bar one -1.5 -1.5
2 foo two -0.5 -0.8
3 bar three -0.0 0.7
4 foo two -1.5 0.9
5 bar two 1.5 0.6
6 foo one -0.0 -0.4
7 foo three 0.5 1.8
我想创建另一列E
,其中mean
A`列中的值为c' each group when grouped by say
,
A B C D E
0 foo one -2.0 0.5 -0.7
1 bar one -1.5 -1.5 0.0
2 foo two -0.5 -0.8 -0.7
3 bar three -0.0 0.7 0.0
4 foo two -1.5 0.9 -0.7
5 bar two 1.5 0.6 0.0
6 foo one -0.0 -0.4 -0.7
7 foo three 0.5 1.8 -0.7
我尝试了这样的例子,例如
等 df['E'] = df.groupby('A').transform(lambda x: pandas.Series(x.C.mean()))
或
df['E'] = df.groupby('A').transform(lambda x: pandas.Series(x['C'].mean()))
但我得到ValueError: Wrong number of items passed 3, placement implies 1
。
以下是完整的错误消息:
Traceback (most recent call last):
File "C:\PF\WinPython-64bit-3.4.2.4\python-3.4.2.amd64\lib\site-packages\pandas\core\internals.py", line 2978, in set
loc = self.items.get_loc(item)
File "C:\PF\WinPython-64bit-3.4.2.4\python-3.4.2.amd64\lib\site-packages\pandas\core\index.py", line 1402, in get_loc
return self._engine.get_loc(_values_from_object(key))
File "pandas\index.pyx", line 134, in pandas.index.IndexEngine.get_loc (pandas\index.c:3807)
File "pandas\index.pyx", line 154, in pandas.index.IndexEngine.get_loc (pandas\index.c:3687)
File "pandas\hashtable.pyx", line 696, in pandas.hashtable.PyObjectHashTable.get_item (pandas\hashtable.c:12310)
File "pandas\hashtable.pyx", line 704, in pandas.hashtable.PyObjectHashTable.get_item (pandas\hashtable.c:12261)
KeyError: 'E'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\PF\WinPython-64bit-3.4.2.4\python-3.4.2.amd64\lib\site-packages\IPython\core\interactiveshell.py", line 2883, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-85-36e1c884837f>", line 1, in <module>
df['E']=df.groupby('A').transform(lambda x: pandas.Series(x.C.max()))
File "C:\PF\WinPython-64bit-3.4.2.4\python-3.4.2.amd64\lib\site-packages\pandas\core\frame.py", line 2110, in __setitem__
self._set_item(key, value)
File "C:\PF\WinPython-64bit-3.4.2.4\python-3.4.2.amd64\lib\site-packages\pandas\core\frame.py", line 2188, in _set_item
NDFrame._set_item(self, key, value)
File "C:\PF\WinPython-64bit-3.4.2.4\python-3.4.2.amd64\lib\site-packages\pandas\core\generic.py", line 1179, in _set_item
self._data.set(key, value)
File "C:\PF\WinPython-64bit-3.4.2.4\python-3.4.2.amd64\lib\site-packages\pandas\core\internals.py", line 2981, in set
self.insert(len(self.items), item, value)
File "C:\PF\WinPython-64bit-3.4.2.4\python-3.4.2.amd64\lib\site-packages\pandas\core\internals.py", line 3080, in insert
placement=slice(loc, loc+1))
File "C:\PF\WinPython-64bit-3.4.2.4\python-3.4.2.amd64\lib\site-packages\pandas\core\internals.py", line 2099, in make_block
placement=placement)
File "C:\PF\WinPython-64bit-3.4.2.4\python-3.4.2.amd64\lib\site-packages\pandas\core\internals.py", line 1427, in __init__
placement=placement)
File "C:\PF\WinPython-64bit-3.4.2.4\python-3.4.2.amd64\lib\site-packages\pandas\core\internals.py", line 76, in __init__
len(self.values), len(self.mgr_locs)))
ValueError: Wrong number of items passed 3, placement implies 1
我可能做错了什么?
我正在使用Python 3.4.2.4和Pandas版本0.15.2
答案 0 :(得分:2)
我认为transform
是正确的方法,但您需要直接获取该列:
>>> df["E"] = df.groupby("A")["C"].transform("mean")
>>> df
A B C D E
0 foo one -2.0 0.5 -0.7
1 bar one -1.5 -1.5 0.0
2 foo two -0.5 -0.8 -0.7
3 bar three -0.0 0.7 0.0
4 foo two -1.5 0.9 -0.7
5 bar two 1.5 0.6 0.0
6 foo one -0.0 -0.4 -0.7
7 foo three 0.5 1.8 -0.7
这与获取分组列的通常方法基本相同:
>>> df.groupby("A")["C"].mean()
A
bar 0.0
foo -0.7
Name: C, dtype: float64
但是transform
会在各组之间广播结果。