对包含字符串的Pandas Pivot进行排序

时间:2016-07-08 07:48:17

标签: python-2.7 sorting pandas pivot

我有一个pandas.DataFrame,其中包含数值,日期值和文本值。像这样:

    Strike  StrikeCell                                      Expiration  ExpirationCell                                  CellContents
0   60.0    \n <div class="cell row-header strike itm" ...  2016-07-15  \n <div class="cell col-header expiration">...  \n <div class="cell option" strike="60.0" m...
1   60.0    \n <div class="cell row-header strike itm" ...  2017-01-20  \n <div class="cell col-header expiration">...  \n <div class="cell option" strike="60.0" m...
2   60.0    \n <div class="cell row-header strike itm" ...  2018-01-19  \n <div class="cell col-header expiration">...  \n <div class="cell option" strike="60.0" 
13  70.0    \n <div class="cell row-header strike itm" ...  2017-01-20  \n <div class="cell col-header expiration">...  \n <div class="cell option" strike="70.0" m...
15  70.0    \n <div class="cell row-header strike itm" ...  2018-01-19  \n <div class="cell col-header expiration">...  \n <div class="cell option" strike="70.0" m...
17  70.0    \n <div class="cell row-header strike itm" ...  2016-10-21  \n <div class="cell col-header expiration">...  \n <div class="cell option" strike="70.0" m...
...
562 260.0   \n <div class="cell row-header strike otm" ...  2017-01-20  \n <div class="cell col-header expiration">...  \n <div class="cell option" strike="260.0" ...
564 270.0   \n <div class="cell row-header strike otm" ...  2017-01-20  \n <div class="cell col-header expiration">...  \n <div class="cell option" strike="270.0" ...
565 280.0   \n <div class="cell row-header strike otm" ...  2017-01-20  \n <div class="cell col-header expiration">...  \n <div class="cell option" strike="280.0" ...

我的目的是让StrikeCell沿着第一列(按升序排列),ExpirationCell跨列(按升序排列)和CellContents作为值内的值表。基本上我正在创建一个带有html格式内容的大型数据透视表。

我可以做以下工作:

df.pivot(index='Strike', columns='Expiration', values='CellContents')

Strike已正确排序,Expiration已正确排序。

但是,如果我尝试使用字符串内容StrikeCellExpirationCell,如下所示:

df.pivot(index='StrikeCell', columns='ExpirationCell', values='CellContents')

排序丢失。

所以问题是如何在使用Strike作为ExpirationStrikeCell作为index时,按Expirationcellcolumns重新获得升序排序}。

使用pandas 0.18.1

1 个答案:

答案 0 :(得分:1)

我相信这对你有用。

首先,让我们修复ExpirationCellStrikeCell的订单。

StrikeCell_ordered = df[['Strike', 'StrikeCell']].sort_values(by='Strike')['StrikeCell']
ExpirationCell_ordered = df[['Expiration', 'ExpirationCell']].sort_values(by='Expiration')['ExpirationCell']

然后转动并应用reindex

pivoted_df = df.pivot(index='StrikeCell', columns='ExpirationCell', values='CellContents')
result = pivoted_df.reindex(index=StrikeCell_ordered, columns=ExpirationCell_ordered)