我有一个按日期编制索引的数据框row
和column
。要保持的条件是row-index-date >= column-index-date
。以下是获取初始数据帧的代码:
import pandas as pd
import numpy as np
np.random.seed(0)
rng = pd.date_range('1/1/2011', periods=5, freq='M')
df = pd.DataFrame(np.random.random((len(rng), len(rng))), index=rng, columns=rng)
idx = df.apply(lambda x: x.index >= x.name, axis=0)
df = df[idx]
df.ix[4, 0:2] = np.nan
df.ix[2, 1] = np.nan
print(df)
给出
2011-01-31 2011-02-28 2011-03-31 2011-04-30 2011-05-31
2011-01-31 0.548814 NaN NaN NaN NaN
2011-02-28 0.645894 0.437587 NaN NaN NaN
2011-03-31 0.791725 NaN 0.568045 NaN NaN
2011-04-30 0.087129 0.020218 0.832620 0.778157 NaN
2011-05-31 NaN NaN 0.461479 0.780529 0.118274
我想将其更改为以下格式:
2011-01-31 2011-02-28 2011-03-31 2011-04-30 2011-05-31
0 0.548814 0.437587 0.568045 0.778157 0.118274
1 0.645894 NaN 0.832620 0.780529 NaN
2 0.791725 0.020218 0.461479 NaN NaN
3 0.087129 NaN NaN NaN NaN
4 NaN NaN NaN NaN NaN
新索引表示原始数据框中的滞后row-index
- column-index
。请注意,每个列的索引都不同。我正在努力为每个列分配新索引,然后重新排列列
答案 0 :(得分:0)
这对我有用:
def align_columns_by_lag(x):
"""Keep Lower triangular, re-indexed columns
"""
xlen = len(x)
idx = x.index >= x.name
newx = x[idx]
newx.reset_index(drop=True, inplace=True)
newx.reindex(range(xlen), fill_value=np.nan)
return newx
df2 = df.apply(align_columns_by_lag, axis=0)
df2