我正在尝试使用以下长格式重塑熊猫数据框:
ISO3 Indicator Year Value
FRA Pop. density 2003 113,6
FRA Pop. density 2004 114,5
FRA Pop. density 2005 115,4
USA Pop. density 2003 31,7
USA Pop. density 2004 32,0
USA Pop. density 2005 32,3
FRA Pupil-teacher ratio 2003 18,6
FRA Pupil-teacher ratio 2004 18,6
FRA Pupil-teacher ratio 2005 18,6
USA Pupil-teacher ratio 2003 14,8
USA Pupil-teacher ratio 2004 14,2
USA Pupil-teacher ratio 2005 14,1
对此:
Pop. density Pupil-teacher ratio
2003 2004 2005 2003 2004 2005
FRA 113,6 114,5 115,4 18,6 18,6 18,6
USA 31,7 32,0 32,3 14,8 14,2 14,1
我尝试过堆叠和旋转,但是没有运气。
枢轴尝试:
smallstack.pivot(index='ISO3', columns=['Indicator', 'Year'], values='Value')
结果:
KeyError Traceback (most recent call last)
<ipython-input-612-c43d9ec16c54> in <module>
----> 1 smallstack.pivot(index='ISO3', columns=['Indicator', 'Year'], values='Value')
~\Anaconda3\envs\scipy18jlab\lib\site-packages\pandas\core\frame.py in pivot(self, index, columns, values)
5192 """
5193 from pandas.core.reshape.reshape import pivot
-> 5194 return pivot(self, index=index, columns=columns, values=values)
5195
5196 _shared_docs['pivot_table'] = """
~\Anaconda3\envs\scipy18jlab\lib\site-packages\pandas\core\reshape\reshape.py in pivot(self, index, columns, values)
404 else:
405 index = self[index]
--> 406 index = MultiIndex.from_arrays([index, self[columns]])
407
408 if is_list_like(values) and not isinstance(values, tuple):
~\Anaconda3\envs\scipy18jlab\lib\site-packages\pandas\core\frame.py in __getitem__(self, key)
2680 if isinstance(key, (Series, np.ndarray, Index, list)):
2681 # either boolean or fancy integer index
-> 2682 return self._getitem_array(key)
2683 elif isinstance(key, DataFrame):
2684 return self._getitem_frame(key)
~\Anaconda3\envs\scipy18jlab\lib\site-packages\pandas\core\frame.py in _getitem_array(self, key)
2724 return self._take(indexer, axis=0)
2725 else:
-> 2726 indexer = self.loc._convert_to_indexer(key, axis=1)
2727 return self._take(indexer, axis=1)
2728
~\Anaconda3\envs\scipy18jlab\lib\site-packages\pandas\core\indexing.py in _convert_to_indexer(self, obj, axis, is_setter)
1325 if mask.any():
1326 raise KeyError('{mask} not in index'
-> 1327 .format(mask=objarr[mask]))
1328
1329 return com._values_from_object(indexer)
KeyError: "['Year'] not in index"
任何建议将不胜感激!
答案 0 :(得分:0)
首先检查列名称:
print (smallstack.columns.tolist())
['ISO3', 'Indicator', 'Year', 'Value']
然后用DataFrame.set_index
和Series.unstack
:
df = smallstack.set_index(['ISO3', 'Indicator', 'Year'])['Value'].unstack([1,2])
print (df)
Indicator Pop. density Pupil-teacher ratio
Year 2003 2004 2005 2003 2004 2005
ISO3
FRA 113,6 114,5 115,4 18,6 18,6 18,6
USA 31,7 32,0 32,3 14,8 14,2 14,1
如果不起作用,因为重复项使用DataFrame.pivot_table
,但首先将列Value
转换为数字:
smallstack['Value'] = smallstack['Value'].str.replace(',','.').astype(float)
smallstack.pivot_table(index='ISO3', columns=['Indicator', 'Year'], values='Value')
答案 1 :(得分:0)
我不确定,是否可以在一次透视操作中完成两组列。错误没有指出,他找不到“年份”,而是他找不到“ ['year']”。这意味着类型是错误的。 一次做一个这样的尝试,最后再进行合并。 当然,除了显示的类别之外,您还必须动态地进行操作。
smallstack = pd.DataFrame({'ISO3': ['FRA', 'USA', 'FRA', 'USA'],
'Indicator': ['Pop. density', 'Pop. density', 'Pupil-teacher ratio', 'Pupil-teacher ratio'],
'Year': [2003, 2004, 2003, 2004],
'Value': [113.6, 115.6, 113.6, 115.6, ]})
pivots = [smallstack.loc[smallstack.Indicator == code].
pivot(index='ISO3', columns='Year', values='Value')
for code in smallstack.Indicator.unique()]
df = pd.concat(pivots, axis='columns')