Pandas MuliIndex选择分层列

时间:2017-10-27 11:59:36

标签: python pandas data-structures multi-index

目标:通过Pandas DataReader转换从EuroStat中提取的原始数据,并重新整形数据,使其具有Pandas DateTime对象作为索引,将国家/地区作为列。

代码:

import pandas as pd
import pandas_datareader as web  
import datetime
start = datetime.datetime(1900,1,1)
end = datetime.date.today()
df2 = web.DataReader('tipsii20', 'eurostat', start = start,end = end)
df2.columns

查看列,我们可以看到我们正在使用MultiIndex

  

MultiIndex(level = [[u'Rest of the world]],[u'Net liability   (负债减去资产)'],[u'Net外债'],[u'Percentage of   国内生产总值(GDP)',[u'未经调整的数据(即两者都没有   季节性调整或日历调整数据)'],[u'Austria',   u'Belgium',u'Bulgaria',u'Croatia',u'Cyprus',u'Czech Republic',   u'Denmark',u'Estonia',u'Finland',u'France',u'Germany(直到1990年   FRG的前领土','u'Greece',u'Hungary',u'Ireland',   u'Italy',u'Latvia',u'Lithuania',u'Luxembourg',u'Malta',   u'Netherlands',u'Poland',u'Portugal',u'Romania',u'Slovakia',   u'Slovenia',u'Spain',u'Sweden',u'United Kingdom'],[u'Annual']],              labels = [[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0,0,0,   0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,[0,0,0,0,0,0,0,   0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,[0,0,   0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,   0,0,0],[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0   0,0,0,0,0,0,0,0],[0,1,2,4,5,10,6,7,11,25,8,9,3,   12,13,14,16,17,15,18,19,20,21,22,26,24,23,27],[0,0,0,   0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,   0,0]],              names = [u'PARTNER',u'STK_FLOW',u'BOP_ITEM',u'UNIT',u'S_ADJ',u'GEO',u'FREQ'])

我想转换此数据集,以便维护其DateTime索引,但使用名称['GEO']作为列。这应该是df2.xs吗?

2 个答案:

答案 0 :(得分:2)

pd.DataFrameget_level_values(5)一起使用,因为GEO处于列的第五级,因为您希望保留数据帧以供将来参考,例如

ndf = pd.DataFrame(df2.values,df2.index,df2.columns.get_level_values(5))

或者通过获取级别值(如

)来分配列
df2.columns =  df2.columns.get_level_values(5)

输出:

print(ndf.head().iloc[:,:4])

GEO          Austria  Belgium  Bulgaria  Cyprus
TIME_PERIOD                                    
2010-01-01      28.0   -121.2      37.1    70.9
2011-01-01      24.0   -118.8      29.6   127.1
2012-01-01      25.8   -102.7      25.4   137.2
2013-01-01      20.1    -88.4      21.6   140.0
2014-01-01      20.0    -71.1      18.3   136.1

答案 1 :(得分:2)

您可以使用droplevel

df2.columns = df2.columns.droplevel([0,1,2,3,4,6])

如果知道级别名称与Bharath shetty' solution类似,则为另一种解决方案:

df2.columns =  df2.columns.get_level_values('GEO')