在庞大的数据帧的任意切片中以Python的形式递增数字

时间:2018-01-26 09:41:24

标签: python pandas

这段代码显示了我想要创建的DataFrame

df = pd.DataFrame(index=pd.date_range(start='4/1/2012', periods=10))
df['foo'] = 7
df['what_i_want'] = [0,0,0,0,1,2,3,0,0,0]

结果如下:

    foo what_i_want
2012-04-01  7   0
2012-04-02  7   0
2012-04-03  7   0
2012-04-04  7   0
2012-04-05  7   1
2012-04-06  7   2
2012-04-07  7   3
2012-04-08  7   0
2012-04-09  7   0
2012-04-10  7   0

我试图想出一种方法,我可以在一系列的任意切片上创建这些1,2,...,n系列。 IE:df['2012-04-05':'2012-04-07'] = magic_function()

但我不确定如何在不使用循环的情况下执行此操作。

3 个答案:

答案 0 :(得分:4)

IIUC,您可以使用loc切片并指定range

df['what_i_want'] = 0
df.loc['2012-04-05':'2012-04-07', 'what_i_want'] = range(1, 4)

df

            foo  what_i_want
2012-04-01    7            0
2012-04-02    7            0
2012-04-03    7            0
2012-04-04    7            0
2012-04-05    7            1
2012-04-06    7            2
2012-04-07    7            3
2012-04-08    7            0
2012-04-09    7            0
2012-04-10    7            0

答案 1 :(得分:2)

首先通过Series切片提取rangelength的索引:

idx = df.loc['2012-04-05':'2012-04-07'].index
df['new'] = pd.Series(range(1, len(idx)+1), index=idx).reindex(df.index, fill_value=0)

或指定range,但有必要替换NaN并转换为int

l = len(df.loc['2012-04-05':'2012-04-07'].index)
df.loc['2012-04-05':'2012-04-07', 'new'] = range(1, l+1)
df['new'] = df['new'].fillna(0).astype(int)
print (df)
            foo  new
2012-04-01    7    0
2012-04-02    7    0
2012-04-03    7    0
2012-04-04    7    0
2012-04-05    7    1
2012-04-06    7    2
2012-04-07    7    3
2012-04-08    7    0
2012-04-09    7    0
2012-04-10    7    0

答案 2 :(得分:0)

你可以这样做:

df.loc['2012-04-08':'2012-04-10']['what_i_want']= \
df.loc['2012-04-08':'2012-04-10'].apply(lambda x:1, axis=1).cumsum()

在将所选值转换为1后,使用所选值的累积和。