pandas slice下三角形,独立重新索引每列,重新排列和连接列

时间:2016-10-27 13:40:48

标签: python datetime pandas dataframe data-manipulation

我有一个按日期编制索引的数据框rowcolumn。要保持的条件是row-index-date >= column-index-date。以下是获取初始数据帧的代码:

import pandas as pd
import numpy as np

np.random.seed(0)


rng = pd.date_range('1/1/2011', periods=5, freq='M')
df = pd.DataFrame(np.random.random((len(rng), len(rng))), index=rng, columns=rng)
idx = df.apply(lambda x: x.index >= x.name, axis=0)
df = df[idx]
df.ix[4, 0:2] = np.nan
df.ix[2, 1] = np.nan
print(df) 

给出

            2011-01-31  2011-02-28  2011-03-31  2011-04-30  2011-05-31 
2011-01-31  0.548814    NaN         NaN         NaN         NaN
2011-02-28  0.645894    0.437587    NaN         NaN         NaN
2011-03-31  0.791725    NaN         0.568045    NaN         NaN
2011-04-30  0.087129    0.020218    0.832620    0.778157    NaN
2011-05-31  NaN         NaN         0.461479    0.780529    0.118274

我想将其更改为以下格式:

    2011-01-31  2011-02-28 2011-03-31   2011-04-30 2011-05-31 
0   0.548814    0.437587    0.568045    0.778157    0.118274
1   0.645894    NaN         0.832620    0.780529    NaN
2   0.791725    0.020218    0.461479    NaN         NaN
3   0.087129    NaN         NaN         NaN         NaN
4   NaN         NaN         NaN         NaN         NaN

新索引表示原始数据框中的滞后row-index - column-index。请注意,每个列的索引都不同。我正在努力为每个列分配新索引,然后重新排列列

1 个答案:

答案 0 :(得分:0)

这对我有用:

def align_columns_by_lag(x):
    """Keep Lower triangular, re-indexed columns

    """
    xlen = len(x)
    idx = x.index >= x.name
    newx = x[idx]
    newx.reset_index(drop=True, inplace=True)
    newx.reindex(range(xlen), fill_value=np.nan)

    return newx

df2 = df.apply(align_columns_by_lag, axis=0)
df2