python:在数据框范围内的元组中填充元组

时间:2019-07-06 15:47:34

标签: python pandas dataframe

我有4个投资组合a,b,c,d,可以在一段时间内采用“不”或“自己”的值。 (下面包含代码以方便复制)

ano=('a','no',datetime(2018,1,1), datetime(2018,1,2))
aown=('a','own',datetime(2018,1,3), datetime(2018,1,4))
bno=('b','no',datetime(2018,1,1), datetime(2018,1,5))
bown=('b','own',datetime(2018,1,6), datetime(2018,1,7))
cown=('c','own',datetime(2018,1,9), datetime(2018,1,10))
down=('d','own',datetime(2018,1,9), datetime(2018,1,9))

sch=pd.DataFrame([ano,aown,bno,bown,cown,down],columns=['portf','base','st','end'])

进度摘要:

    portf   base    st          end
0   a       no      2018-01-01  2018-01-02
1   a       own     2018-01-03  2018-01-04
2   b       no      2018-01-01  2018-01-05
3   b       own     2018-01-06  2018-01-07
4   c       own     2018-01-09  2018-01-10
5   d       own     2018-01-09  2018-01-09  

我尝试过的方法:创建一个保持数据框并根据计划表填写值。不幸的是,第一个投资组合“ a”被覆盖了

df=pd.DataFrame(index=pd.date_range(min(sch.st),max(sch.end)),columns=['portf','base'])
for row in range(len(sch)):
        df.loc[sch['st'][row]:sch['end'][row],['portf','base']]= sch.loc[row,['portf','base']].values

            portf   base
2018-01-01  b       no
2018-01-02  b       no
2018-01-03  b       no
2018-01-04  b       no
2018-01-05  b       no
2018-01-06  b       own
2018-01-07  b       own
2018-01-08  NaN     NaN
2018-01-09  d       own
2018-01-10  c       own

所需的输出:

2018-01-01  (('a','no'), ('b','no'))
2018-01-02  (('a','no'), ('b','no'))
2018-01-03  (('a','own'), ('b','no'))
2018-01-04  (('a','own'), ('b','no'))
2018-01-05  ('b','no')
...

我敢肯定有一种更简单的方法可以实现这一目标,但是可能这是我以前从未遇到过的例子。提前非常感谢!

1 个答案:

答案 0 :(得分:0)

我将以不同的方式组织数据,索引是日期,portf的列,值是基数。

首先,我们需要重塑数据并重新采样到每日字段。这是一个简单的枢轴。

cols = ['portf', 'base']
s = (df.reset_index()
       .melt(cols+['index'], value_name='date')
       .set_index('date')
       .groupby(cols+['index'], group_keys=False)
       .resample('D').ffill()
       .drop(columns=['variable', 'index'])
       .reset_index())

res = s.pivot(index='date', columns='portf')
res = res.resample('D').first()  # Recover missing dates between

输出res

           base               
portf         a    b    c    d
2018-01-01   no   no  NaN  NaN
2018-01-02   no   no  NaN  NaN
2018-01-03  own   no  NaN  NaN
2018-01-04  own   no  NaN  NaN
2018-01-05  NaN   no  NaN  NaN
2018-01-06  NaN  own  NaN  NaN
2018-01-07  NaN  own  NaN  NaN
2018-01-08  NaN  NaN  NaN  NaN
2018-01-09  NaN  NaN  own  own
2018-01-10  NaN  NaN  own  NaN

如果您需要其他输出,我们可以通过一些不理想的Series.apply电话到达那里。这对于大型DataFrame来说将是非常糟糕的。我会认真考虑保留以上内容。

s.set_index('date').apply(tuple, axis=1).groupby('date').apply(tuple)

date
2018-01-01      ((a, no), (b, no))
2018-01-02      ((a, no), (b, no))
2018-01-03     ((a, own), (b, no))
2018-01-04     ((a, own), (b, no))
2018-01-05              ((b, no),)
2018-01-06             ((b, own),)
2018-01-07             ((b, own),)
2018-01-09    ((c, own), (d, own))
2018-01-10             ((c, own),)
dtype: object