Pandas长到宽格式与多索引

时间:2017-11-08 11:44:16

标签: python pandas pivot-table reshape

我有一个如下所示的数据框:

wc-ajax=get_refreshed_fragments

我想进行转化,以便data.head() Out[2]: Area Area Id Variable Name Variable Id Year \ 0 Argentina 9 Conservation agriculture area 4454 1982 1 Argentina 9 Conservation agriculture area 4454 1987 2 Argentina 9 Conservation agriculture area 4454 1992 3 Argentina 9 Conservation agriculture area 4454 1997 4 Argentina 9 Conservation agriculture area 4454 2002 Value Symbol Md 0 2.0 1 6.0 2 500.0 为列,Variable NameArea为索引,Year为值。对我来说最直观的方法是使用:

Value

然而我收到错误:

data.pivot(index=['Area', 'Year'], columns='Variable Name', values='Value)

我该如何解读?我也尝试过另一种方式:

Traceback (most recent call last):
  File "C:\Users\patri\Miniconda3\lib\site-packages\IPython\core\interactiveshell.py", line 2862, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-4-4c786386b703>", line 1, in <module>
    pd.concat(data_list).pivot(index=['Area', 'Year'], columns='Variable Name', values='Value')
  File "C:\Users\patri\Miniconda3\lib\site-packages\pandas\core\frame.py", line 3853, in pivot
    return pivot(self, index=index, columns=columns, values=values)
  File "C:\Users\patri\Miniconda3\lib\site-packages\pandas\core\reshape\reshape.py", line 377, in pivot
    index=MultiIndex.from_arrays([index, self[columns]]))
  File "C:\Users\patri\Miniconda3\lib\site-packages\pandas\core\series.py", line 250, in __init__
    data = SingleBlockManager(data, index, fastpath=True)
  File "C:\Users\patri\Miniconda3\lib\site-packages\pandas\core\internals.py", line 4117, in __init__
    fastpath=True)
  File "C:\Users\patri\Miniconda3\lib\site-packages\pandas\core\internals.py", line 2719, in make_block
    return klass(values, ndim=ndim, fastpath=fastpath, placement=placement)
  File "C:\Users\patri\Miniconda3\lib\site-packages\pandas\core\internals.py", line 1844, in __init__
    placement=placement, **kwargs)
  File "C:\Users\patri\Miniconda3\lib\site-packages\pandas\core\internals.py", line 115, in __init__
    len(self.mgr_locs)))
ValueError: Wrong number of items passed 119611, placement implies 2

尝试获得相同的结果,但我收到此错误:

data.set_index(['Area', 'Variable Name', 'Year']).loc[:, 'Value'].unstack('Variable Name')

数据有问题吗?我已确认数据框的任何行中没有Traceback (most recent call last): File "C:\Users\patri\Miniconda3\lib\site-packages\IPython\core\interactiveshell.py", line 2862, in run_code exec(code_obj, self.user_global_ns, self.user_ns) File "<ipython-input-5-222325ea01e1>", line 1, in <module> pd.concat(data_list).set_index(['Area', 'Variable Name', 'Year']).loc[:, 'Value'].unstack('Variable Name') File "C:\Users\patri\Miniconda3\lib\site-packages\pandas\core\series.py", line 2028, in unstack return unstack(self, level, fill_value) File "C:\Users\patri\Miniconda3\lib\site-packages\pandas\core\reshape\reshape.py", line 458, in unstack fill_value=fill_value) File "C:\Users\patri\Miniconda3\lib\site-packages\pandas\core\reshape\reshape.py", line 110, in __init__ self._make_selectors() File "C:\Users\patri\Miniconda3\lib\site-packages\pandas\core\reshape\reshape.py", line 148, in _make_selectors raise ValueError('Index contains duplicate entries, ' ValueError: Index contains duplicate entries, cannot reshape AreaVariable Name的重复组合,因此我认为不应该有任何重复的条目,但我可能是错的。如果这两种方法目前都不工作,我如何从长格式转换为宽格式?我已经检查了答案here和{{3}},但它们都是涉及某些类型I聚合的情况。

我尝试过像这样使用Year

pivot_table

但我认为正在进行某种类型的聚合,数据集中有很多缺失值会导致此错误:

data.pivot_table(index=['Area', 'Year'], columns='Variable Name', values='Value')

1 个答案:

答案 0 :(得分:1)

我认为您需要先将列Value转换为数字,然后将pivot_table与默认聚合函数mean一起使用:

#if all float data saved as strings
data['Value'] = data['Value'].astype(float)
#if some bad data like strings and first method return value error
data['Value'] = pd.to_numeric(data['Value'], errors='coerce')
data.pivot_table(index=['Area', 'Year'], columns='Variable Name', values='Value')

或者:

data.groupby(['Area', 'Variable Name', 'Year'])[ 'Value'].mean().unstack('Variable Name')