使用pandas dataframe滑动窗口数据

时间:2017-09-21 03:46:59

标签: python pandas dataframe sliding-window

我有一个看起来像这样的数据集:

$data = $request->(['id', 'name', 'city', 'address']);
$user = User::find($data['id']);
$user->name = $data['name']: $data['name'] ? $user->name;
$user->city= $data['city'] : $data['city'] ? $user->city;
$user->address = $data['address'] : $data['address'] ? $user->address;
$user->save();

enter image description here

我想要的是一个可以将窗口大小作为输入的函数,并给我这样的东西:

功能:df = DataFrame(dict(month = [1,2,3,4,5,6], a = [2,4,2,4,2,4], b = [3,5,6,3,4,6]))

  1. 如果我def make_sliding_df(data, size),输出应该是这样的数据帧:
  2. enter image description here

    1. 如果我make_sliding_df(df, 1),输出应该是这样的数据帧:
    2. enter image description here

      我尝试了很多东西,但到目前为止没有人帮助我,任何帮助都会受到赞赏。(我已经检查过其他几个类似的问题,但没有人帮忙)

2 个答案:

答案 0 :(得分:2)

以下是使用shiftapplymapreduce

的一种方法
In [2007]: def make_sliding(df, N):
      ...:     dfs = [df.shift(-i).applymap(lambda x: [x]) for i in range(0, N+1)]
      ...:     return reduce(lambda x, y: x.add(y), dfs)
      ...:

In [2008]: make_sliding(df, 1)
Out[2008]:
          a         b     month
0  [2, 4.0]  [3, 5.0]  [1, 2.0]
1  [4, 2.0]  [5, 6.0]  [2, 3.0]
2  [2, 4.0]  [6, 3.0]  [3, 4.0]
3  [4, 2.0]  [3, 4.0]  [4, 5.0]
4  [2, 4.0]  [4, 6.0]  [5, 6.0]
5  [4, nan]  [6, nan]  [6, nan]

In [2009]: make_sliding(df, 2)
Out[2009]:
               a              b          month
0  [2, 4.0, 2.0]  [3, 5.0, 6.0]  [1, 2.0, 3.0]
1  [4, 2.0, 4.0]  [5, 6.0, 3.0]  [2, 3.0, 4.0]
2  [2, 4.0, 2.0]  [6, 3.0, 4.0]  [3, 4.0, 5.0]
3  [4, 2.0, 4.0]  [3, 4.0, 6.0]  [4, 5.0, 6.0]
4  [2, 4.0, nan]  [4, 6.0, nan]  [5, 6.0, nan]
5  [4, nan, nan]  [6, nan, nan]  [6, nan, nan]

答案 1 :(得分:0)

使用numpy这可能看起来很难看,但这是我第一次尝试使用numpy ...

def make_sliding_df(df,step=1,width=2):
    l=[]
    for x in df.columns:
        a=df[x]
        a=np.array(a)
        b=np.append(a,[np.nan]*(width-1))
        l.append((b[(np.arange(width)[None, :] + step*np.arange(len(a))[:, None])]).tolist())
    newdf=pd.DataFrame(data=l).T
    newdf.columns=df.columns
    return(newdf)

make_sliding_df(df,step=1,width=2)
Out[157]: 
            a           b       month
0  [2.0, 4.0]  [3.0, 5.0]  [1.0, 2.0]
1  [4.0, 2.0]  [5.0, 6.0]  [2.0, 3.0]
2  [2.0, 4.0]  [6.0, 3.0]  [3.0, 4.0]
3  [4.0, 2.0]  [3.0, 4.0]  [4.0, 5.0]
4  [2.0, 4.0]  [4.0, 6.0]  [5.0, 6.0]
5  [4.0, nan]  [6.0, nan]  [6.0, nan]

make_sliding_df(df,step=1,width=3)
Out[158]: 
                 a                b            month
0  [2.0, 4.0, 2.0]  [3.0, 5.0, 6.0]  [1.0, 2.0, 3.0]
1  [4.0, 2.0, 4.0]  [5.0, 6.0, 3.0]  [2.0, 3.0, 4.0]
2  [2.0, 4.0, 2.0]  [6.0, 3.0, 4.0]  [3.0, 4.0, 5.0]
3  [4.0, 2.0, 4.0]  [3.0, 4.0, 6.0]  [4.0, 5.0, 6.0]
4  [2.0, 4.0, nan]  [4.0, 6.0, nan]  [5.0, 6.0, nan]
5  [4.0, nan, nan]  [6.0, nan, nan]  [6.0, nan, nan]