熊猫枢轴转换DataFrame

时间:2019-08-28 08:06:37

标签: python-3.x pandas pivot

我只是尝试播放pivot来转换数据帧,以我希望哪个枢轴起作用但在这里不起作用的方式来实现。

请提供任何专家见解。

DataFrame:

>>> df1
     id   item value
0  2225  prize   1.5
1  2225   unit    kg
2  2225  prize   2.4
3  8187   unit    lt
4  1401  stock    10
5  1401  prize   4.3

运行pivot时,我看到以下错误。

>>> df1.pivot('id', 'item')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/grid/common/pkgs/python/v3.6.1/lib/python3.6/site-packages/pandas/core/frame.py", line 4359, in pivot
    return pivot(self, index=index, columns=columns, values=values)
  File "/grid/common/pkgs/python/v3.6.1/lib/python3.6/site-packages/pandas/core/reshape/reshape.py", line 381, in pivot
    return indexed.unstack(columns)
  File "/grid/common/pkgs/python/v3.6.1/lib/python3.6/site-packages/pandas/core/frame.py", line 4546, in unstack
    return unstack(self, level, fill_value)
  File "/grid/common/pkgs/python/v3.6.1/lib/python3.6/site-packages/pandas/core/reshape/reshape.py", line 469, in unstack
    return _unstack_frame(obj, level, fill_value=fill_value)
  File "/grid/common/pkgs/python/v3.6.1/lib/python3.6/site-packages/pandas/core/reshape/reshape.py", line 488, in _unstack_frame
    fill_value=fill_value)
  File "/grid/common/pkgs/python/v3.6.1/lib/python3.6/site-packages/pandas/core/reshape/reshape.py", line 116, in __init__
    self._make_selectors()
  File "/grid/common/pkgs/python/v3.6.1/lib/python3.6/site-packages/pandas/core/reshape/reshape.py", line 154, in _make_selectors
    raise ValueError('Index contains duplicate entries, '
ValueError: Index contains duplicate entries, cannot reshape

即使pivot_table也会产生错误。

>>> df1.pivot_table(columns='item', values='value')

我尝试github link reference,但没有收到。

所需的应该是:

      Value

item  prize  stock  unit
    id

2225  1.5  10  2.4  lt

4 个答案:

答案 0 :(得分:3)

您应该尝试以下操作:

pd.pivot_table(df1, values='value', index=['id'], columns=['item'], aggfunc=np.sum)

正如jezrael下文所述,aggfunc可以表示数字值,可以将字符串连接起来。

答案 1 :(得分:2)

数据问题是在列value中存在重复项,并且还有数字与字符串的混合。

一般解决方案-如果需要数字值mean和重复字符串join

def f(x):
    y = pd.to_numeric(x, errors='coerce')
    if y.isna().all():
        return ', '.join(x)
    else:
        return  y.mean()

df = df1.pivot_table(index='id',columns='item', values='value', aggfunc=f)
print (df)
item prize stock unit
id                   
1401   4.3    10  NaN
2225  1.95   NaN   kg
8187   NaN   NaN   lt

另一种方法是将数字和非数字聚合在一起,然后concat在一起:

df1['value1'] = pd.to_numeric(df1['value'], errors='coerce')

df2 = df1.pivot_table(index='id',columns='item', values='value1', aggfunc='mean')

df3 = df1[df1['value1'].isna()]
            .pivot_table(index='id',columns='item', values='value', aggfunc=','.join)

df = pd.concat([df2, df3], axis=1)
print (df)

item  prize  stock unit
id                     
1401   4.30   10.0  NaN
2225   1.95    NaN   kg
8187    NaN    NaN   lt

答案 2 :(得分:2)

根据the pivot doc

Raises
------
ValueError:
    When there are any `index`, `columns` combinations with multiple
    values. `DataFrame.pivot_table` when you need to aggregate.

在您的情况下,id=2225有2个prize条目,pivot无法处理。您可以先汇总再进行透视:

df1.groupby(['id', 'item']).sum().reset_index().pivot('id', 'item', 'value')

+------+-------+-------+------+
| item | prize | stock | unit |
+------+-------+-------+------+
| id   |       |       |      |
| 1401 | 4.3   | 10    | NaN  |
| 2225 | 3.9   | NaN   | kg   |
| 8187 | NaN   | NaN   | lt   |
+------+-------+-------+------+

答案 3 :(得分:-1)

Pandas抱怨您在索引0和2两次拥有条目(2225, prize)的事实。这是一个数据库问题,而不是pandas的行为。

修复此重复条目可消除错误:

# the original database
>>> df 
     id   item value                                                                                                                                                                                                                          
0  2225  prize   1.5                                                                                                                                                                                                                          
1  2225   unit    kg                                                                                                                                                                                                                          
2  2225  prize   2.4                                                                                                                                                                                                                          
3  8187   unit    lt                                                                                                                                                                                                                          
4  1401  stock    10                                                                                                                                                                                                                          
5  1401  prize   4.3  

# removing the duplicate error by changing index 2
>>> df.loc[2, 'id'] = 8187
>>> df
     id   item value
0  2225  prize   1.5
1  2225   unit    kg
2  8187  prize   2.4
3  8187   unit    lt
4  1401  stock    10
5  1401  prize   4.3

# pivot now works properly
>>> df.pivot('id', 'item')
     value
item prize stock unit
id
1401   4.3    10  NaN
2225   1.5   NaN   kg
8187   2.4   NaN   lt