使用df.pivot

时间:2019-05-07 14:09:09

标签: python pandas pivot

我通常需要使用 pandas的数据透视功能将堆叠的长格式数据框转换为未堆叠的宽格式数据框

据我所知How to pivot a dataframe并没有解决我的问题。

如果存在重复的条目,则透视失败,通常我会通过使用Excel的透视表检查数据并使用count()来汇总值来跟踪和修复这些重复的条目。这在大多数情况下(但并非总是如此)有效,但是我想知道是否有一种方法可以停留在jupyterlab中,而无需使用Excel就可以在数据中找到问题。

我有一个看起来像这样的数据框:

    ISO3    Country Indicator   Year    Value
45  FRA France  Domestic credit 2011    54.68
140 GBR United Kingdom  Domestic credit 2011    89.39
141 USA United States   Domestic credit 2011    93.10
217 FRA France  Domestic credit 2012    37.41
368 GBR United Kingdom  Domestic credit 2012    58.50
369 USA United States   Domestic credit 2012    63.10
448 FRA France  Domestic credit 2012    36.03
599 GBR United Kingdom  Domestic credit 2013    50.95
600 USA United States   Domestic credit 2013    63.40
679 FRA France  Domestic credit 2014    36.63
830 GBR United Kingdom  Domestic credit 2014    54.47
831 USA United States   Domestic credit 2014    78.00

我想转换为这种格式(使用pivot_table创建,可以处理重复项,但是不正确)

Year    2011    2012    2013    2014
ISO3                
FRA 54.68   36.72   NaN 36.63
GBR 89.39   58.50   50.95   54.47
USA 93.10   63.10   63.40   78.00

使用

extra_domestic_credit.pivot(index = 'ISO3', columns = 'Year', values = 'Value') 

但这会导致

----------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-86-745170bb5af5> in <module>
----> 1 extra_domestic_credit.pivot(index = 'ISO3', columns = 'Year', values = 'Value')

~\Anaconda3\envs\scipy18jlab\lib\site-packages\pandas\core\frame.py in pivot(self, index, columns, values)
   5192         """
   5193         from pandas.core.reshape.reshape import pivot
-> 5194         return pivot(self, index=index, columns=columns, values=values)
   5195 
   5196     _shared_docs['pivot_table'] = """

~\Anaconda3\envs\scipy18jlab\lib\site-packages\pandas\core\reshape\reshape.py in pivot(self, index, columns, values)
    413             indexed = self._constructor_sliced(self[values].values,
    414                                                index=index)
--> 415     return indexed.unstack(columns)
    416 
    417 

~\Anaconda3\envs\scipy18jlab\lib\site-packages\pandas\core\series.py in unstack(self, level, fill_value)
   2897         """
   2898         from pandas.core.reshape.reshape import unstack
-> 2899         return unstack(self, level, fill_value)
   2900 
   2901     # ----------------------------------------------------------------------

~\Anaconda3\envs\scipy18jlab\lib\site-packages\pandas\core\reshape\reshape.py in unstack(obj, level, fill_value)
    499         unstacker = _Unstacker(obj.values, obj.index, level=level,
    500                                fill_value=fill_value,
--> 501                                constructor=obj._constructor_expanddim)
    502         return unstacker.get_result()
    503 

~\Anaconda3\envs\scipy18jlab\lib\site-packages\pandas\core\reshape\reshape.py in __init__(self, values, index, level, value_columns, fill_value, constructor)
    135 
    136         self._make_sorted_values_labels()
--> 137         self._make_selectors()
    138 
    139     def _make_sorted_values_labels(self):

~\Anaconda3\envs\scipy18jlab\lib\site-packages\pandas\core\reshape\reshape.py in _make_selectors(self)
    173 
    174         if mask.sum() < len(self.index):
--> 175             raise ValueError('Index contains duplicate entries, '
    176                              'cannot reshape')
    177 

ValueError: Index contains duplicate entries, cannot reshape

这是由于行217和448中的ISO3和Year行重复。这是一个人为的示例,在此我故意引入了错误,但如何找到问题,而又没有将df写入excel就阻止了重塑在那儿调查数据?

0 个答案:

没有答案