我只是尝试播放pivot
来转换数据帧,以我希望哪个枢轴起作用但在这里不起作用的方式来实现。
请提供任何专家见解。
>>> df1
id item value
0 2225 prize 1.5
1 2225 unit kg
2 2225 prize 2.4
3 8187 unit lt
4 1401 stock 10
5 1401 prize 4.3
运行pivot
时,我看到以下错误。
>>> df1.pivot('id', 'item')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/grid/common/pkgs/python/v3.6.1/lib/python3.6/site-packages/pandas/core/frame.py", line 4359, in pivot
return pivot(self, index=index, columns=columns, values=values)
File "/grid/common/pkgs/python/v3.6.1/lib/python3.6/site-packages/pandas/core/reshape/reshape.py", line 381, in pivot
return indexed.unstack(columns)
File "/grid/common/pkgs/python/v3.6.1/lib/python3.6/site-packages/pandas/core/frame.py", line 4546, in unstack
return unstack(self, level, fill_value)
File "/grid/common/pkgs/python/v3.6.1/lib/python3.6/site-packages/pandas/core/reshape/reshape.py", line 469, in unstack
return _unstack_frame(obj, level, fill_value=fill_value)
File "/grid/common/pkgs/python/v3.6.1/lib/python3.6/site-packages/pandas/core/reshape/reshape.py", line 488, in _unstack_frame
fill_value=fill_value)
File "/grid/common/pkgs/python/v3.6.1/lib/python3.6/site-packages/pandas/core/reshape/reshape.py", line 116, in __init__
self._make_selectors()
File "/grid/common/pkgs/python/v3.6.1/lib/python3.6/site-packages/pandas/core/reshape/reshape.py", line 154, in _make_selectors
raise ValueError('Index contains duplicate entries, '
ValueError: Index contains duplicate entries, cannot reshape
即使pivot_table
也会产生错误。
>>> df1.pivot_table(columns='item', values='value')
我尝试github link reference,但没有收到。
所需的应该是:
Value
item prize stock unit
id
2225 1.5 10 2.4 lt
答案 0 :(得分:3)
您应该尝试以下操作:
pd.pivot_table(df1, values='value', index=['id'], columns=['item'], aggfunc=np.sum)
正如jezrael下文所述,aggfunc可以表示数字值,可以将字符串连接起来。
答案 1 :(得分:2)
数据问题是在列value
中存在重复项,并且还有数字与字符串的混合。
一般解决方案-如果需要数字值mean
和重复字符串join
:
def f(x):
y = pd.to_numeric(x, errors='coerce')
if y.isna().all():
return ', '.join(x)
else:
return y.mean()
df = df1.pivot_table(index='id',columns='item', values='value', aggfunc=f)
print (df)
item prize stock unit
id
1401 4.3 10 NaN
2225 1.95 NaN kg
8187 NaN NaN lt
另一种方法是将数字和非数字聚合在一起,然后concat
在一起:
df1['value1'] = pd.to_numeric(df1['value'], errors='coerce')
df2 = df1.pivot_table(index='id',columns='item', values='value1', aggfunc='mean')
df3 = df1[df1['value1'].isna()]
.pivot_table(index='id',columns='item', values='value', aggfunc=','.join)
df = pd.concat([df2, df3], axis=1)
print (df)
item prize stock unit
id
1401 4.30 10.0 NaN
2225 1.95 NaN kg
8187 NaN NaN lt
答案 2 :(得分:2)
Raises
------
ValueError:
When there are any `index`, `columns` combinations with multiple
values. `DataFrame.pivot_table` when you need to aggregate.
在您的情况下,id=2225
有2个prize
条目,pivot
无法处理。您可以先汇总再进行透视:
df1.groupby(['id', 'item']).sum().reset_index().pivot('id', 'item', 'value')
+------+-------+-------+------+
| item | prize | stock | unit |
+------+-------+-------+------+
| id | | | |
| 1401 | 4.3 | 10 | NaN |
| 2225 | 3.9 | NaN | kg |
| 8187 | NaN | NaN | lt |
+------+-------+-------+------+
答案 3 :(得分:-1)
Pandas抱怨您在索引0和2两次拥有条目(2225, prize)
的事实。这是一个数据库问题,而不是pandas
的行为。
修复此重复条目可消除错误:
# the original database
>>> df
id item value
0 2225 prize 1.5
1 2225 unit kg
2 2225 prize 2.4
3 8187 unit lt
4 1401 stock 10
5 1401 prize 4.3
# removing the duplicate error by changing index 2
>>> df.loc[2, 'id'] = 8187
>>> df
id item value
0 2225 prize 1.5
1 2225 unit kg
2 8187 prize 2.4
3 8187 unit lt
4 1401 stock 10
5 1401 prize 4.3
# pivot now works properly
>>> df.pivot('id', 'item')
value
item prize stock unit
id
1401 4.3 10 NaN
2225 1.5 NaN kg
8187 2.4 NaN lt