带部分字符串的取消数据帧

时间:2019-02-21 19:06:56

标签: python pandas

我有一个数据框(totaldf),例如:

           ...     Hom   ...    March Plans   March Ships   April Plans   April Ships   ...

0                  CAD   ...    12              5           4             13
1                  USA   ...    7               6           2             11
2                  CAD   ...    4               9           6             14
3                  CAD   ...    13              3           9             7
...                ...   ...    ...             ...         ...           ...

一年中的所有月份。我希望是这样:

           ...     Hom   ...    Month   Plans    Ships    ...

0                  CAD   ...    March    12          5             
1                  USA   ...    March    7           6             
2                  CAD   ...    March    4           9             
3                  CAD   ...    March    13          3
4                  CAD   ...    April    4           13            
5                  USA   ...    April    2           11             
6                  CAD   ...    April    6           14             
7                  CAD   ...    April    9           7
...                ...   ...    ...      ...         ...

是否有一种简单的方法可以在不拆分字符串条目的情况下进行此操作? 我玩过totaldf.unstack(),但是由于有多个列,因此我不确定如何正确地重新索引数据框。

2 个答案:

答案 0 :(得分:4)

如果将列转换为MultiIndex,则可以使用堆栈:

In [11]: df1 = df.set_index("Hom")

In [12]: df1.columns = pd.MultiIndex.from_tuples(df1.columns.map(lambda x: tuple(x.split())))

In [13]: df1
Out[13]:
    March       April
    Plans Ships Plans Ships
Hom
CAD    12     5     4    13
USA     7     6     2    11
CAD     4     9     6    14
CAD    13     3     9     7

In [14]: df1.stack(level=0)
Out[14]:
           Plans  Ships
Hom
CAD April      4     13
    March     12      5
USA April      2     11
    March      7      6
CAD April      6     14
    March      4      9
    April      9      7
    March     13      3

In [21]: res = df1.stack(level=0)

In [22]: res.index.names = ["Hom", "Month"]

In [23]: res.reset_index()
Out[23]:
   Hom  Month  Plans  Ships
0  CAD  April      4     13
1  CAD  March     12      5
2  USA  April      2     11
3  USA  March      7      6
4  CAD  April      6     14
5  CAD  March      4      9
6  CAD  April      9      7
7  CAD  March     13      3

答案 1 :(得分:2)

鉴于文档中所述,您可以使用pd.wide_to_long,并做一些额外的工作以拥有正确的stubnames

  

存根名称。假定宽格式变量以存根名称开头​​。

因此有必要稍微修改列名称,以使存根名称位于每个列名称的开头:

m = df.columns.str.contains('Plans|Ships')
cols = df.columns[m].str.split(' ')
df.columns.values[m] = [w+month for month, w in cols]

print(df)
   Hom  PlansMarch  ShipsMarch  PlansApril  ShipsApril
0  CAD          12           5           4          13
1  USA           7           6           2          11
2  CAD           4           9           6          14
3  CAD          13           3           9           7

现在,您可以使用pd.wide_to_long['Ships', 'Plans']作为存根名称来获取所需的输出:

((pd.wide_to_long(df.reset_index(), stubnames=['Ships', 'Plans'], i = 'index', 
                j = 'Month', suffix='\w+')).reset_index(drop=True, level=0)
                .reset_index())

x  Month  Hom  Ships  Plans
0  March  CAD      5     12
1  March  USA      6      7
2  March  CAD      9      4
3  March  CAD      3     13
4  April  CAD     13      4
5  April  USA     11      2
6  April  CAD     14      6
7  April  CAD      7      9