当索引和值在同一列时,如何进行多索引转轴?

时间:2017-03-22 11:58:08

标签: pandas dataframe reshape melt

我有这个框架:

regions = pd.read_html('http://www.mapsofworld.com/usa/usa-maps/united-states-regional-maps.html')
messy_regions = regions[8]

产生类似这样的东西:

    |0  |      1
--- |---| ---
0|  Region 1 (The Northeast)|   nan
1|  Division 1 (New England)|   Division 2 (Middle Atlantic)
2|  Maine                      | New York
3|  New Hampshire              | Pennsylvania
4|  Vermont                    | New Jersey
5|  Massachusetts               |nan
6|  Rhode Island                |nan
7|  Connecticut                | nan
8|  Region 2 (The Midwest)    |  nan
9|  Division 3 (East North Central)|    Division 4 (West North Central)
10| Wisconsin     |             North Dakota
11| Michigan     |              South Dakota
12| Illinois    |               Nebraska

目标是使这个数据帧整洁,我认为我需要进行调整,以便将区域和分区作为列,将状态作为正确区域/分区下的值。一旦它处于那种形状,那么我就可以融化成所需的形状。我无法弄清楚如何从中提取出来的列标题。任何帮助都是值得赞赏的,至少是正确方向的一个好点。

1 个答案:

答案 0 :(得分:1)

您可以使用:

url = 'http://www.mapsofworld.com/usa/usa-maps/united-states-regional-maps.html'
#input dataframe with columns a, b 
df = pd.read_html(url)[8]
df.columns = ['a','b']

#extract Region data to new column
df['Region'] = df['a'].where(df['a'].str.contains('Region', na=False)).ffill()
#reshaping, remove rows with NaNs, remove column variable
df = pd.melt(df, id_vars='Region', value_name='Names')
       .sort_values(['Region', 'variable'])
       .dropna()
       .drop('variable', axis=1)
#extract Division data to new column
df['Division'] = df['Names'].where(df['Names'].str.contains('Division', na=False)).ffill()
#remove duplicates from column Names, change order of columns
df = df[(df.Division != df.Names) & (df.Region != df.Names)]
      .reset_index(drop=False)
      .reindex_axis(['Region','Division','Names'], axis=1)
#temporaly display all columns
with pd.option_context('display.expand_frame_repr', False):
    print (df)

                      Region                         Division                 Names
0   Region 1 (The Northeast)         Division 1 (New England)                 Maine
1   Region 1 (The Northeast)         Division 1 (New England)         New Hampshire
2   Region 1 (The Northeast)         Division 1 (New England)               Vermont
3   Region 1 (The Northeast)         Division 1 (New England)         Massachusetts
4   Region 1 (The Northeast)         Division 1 (New England)          Rhode Island
5   Region 1 (The Northeast)         Division 1 (New England)           Connecticut
6   Region 1 (The Northeast)     Division 2 (Middle Atlantic)              New York
7   Region 1 (The Northeast)     Division 2 (Middle Atlantic)          Pennsylvania
8   Region 1 (The Northeast)     Division 2 (Middle Atlantic)            New Jersey
9     Region 2 (The Midwest)  Division 3 (East North Central)             Wisconsin
10    Region 2 (The Midwest)  Division 3 (East North Central)              Michigan
11    Region 2 (The Midwest)  Division 3 (East North Central)              Illinois
12    Region 2 (The Midwest)  Division 3 (East North Central)               Indiana
13    Region 2 (The Midwest)  Division 3 (East North Central)                  Ohio
...
...