熊猫:从两列数据框到(时间序列)多列数据框

时间:2020-08-14 08:27:58

标签: python pandas dataframe time-series

假设我们有一个像这样的数据框

df = pd.DataFrame(columns=['A', 'B','C'])
df.loc[0]=[1,2,3]
df.loc[1]=[4,5,6]
df.loc[2]=[7,8,9]
df.loc[3]=[10,11,12]
df.loc[4]=[13,14,15]
df.loc[5]=[16,17,18]
df.loc[6]=[19,20,21]
df


    A   B   C

0   1   2   3

1   4   5   6

2   7   8   9

3   10  11  12

4   13  14  15

5   16  17  18

6   19  20  21

我想修改df以获得df2;

df2 = pd.DataFrame(columns=['first', 'second','third','fourth','fifth','sixth'])
df2.loc[0]=[1,2,4,5,7,8]
df2.loc[1]=[4,5,7,8,10,11]
df2.loc[2]=[7,8,10,11,13,14]
df2.loc[3]=[10,11,13,14,16,17]
df2.loc[4]=[13,14,16,17,19,20]
df2

    first   second  third   fourth  fifth   sixth

0   1   2   4   5   7   8

1   4   5   7   8   10  11

2   7   8   10  11  13  14

3   10  11  13  14  16  17

4   13  14  16  17  19  20

也就是说,我要用df的前两列的三行填充df2的第一行。 然后,我们用df两列的下三行填充df2的第二行,依此类推。

从df转到df2应该怎么做? 我可以做一些简单的基本操作。 但是现在对我来说仍然很难。

有人可以帮助我吗?

3 个答案:

答案 0 :(得分:1)

您可以使用strides通过ravel将前两列转换为1d数组,也可以通过索引[::2]来选择每一对行

def rolling_window(a, window):
    shape = a.shape[:-1] + (a.shape[-1] - window + 1, window)
    strides = a.strides + (a.strides[-1],)
    return np.lib.stride_tricks.as_strided(a, shape=shape, strides=strides)

a = rolling_window(df[['A','B']].to_numpy().ravel(), 6)[::2]
print (a)
[[1 2 4 5 7 8]
 [4 5 7 8 10 11]
 [7 8 10 11 13 14]
 [10 11 13 14 16 17]
 [13 14 16 17 19 20]]

df2 = pd.DataFrame(a, columns=['first', 'second','third','fourth','fifth','sixth'])
print (df2)
  first second third fourth fifth sixth
0     1      2     4      5     7     8
1     4      5     7      8    10    11
2     7      8    10     11    13    14
3    10     11    13     14    16    17
4    13     14    16     17    19    20

答案 1 :(得分:1)

将NumPy用作:

import numpy as np
new = df.values[:, :2].reshape(-1)
l = [new[2*i:2*i+6] for i in range(int(new.shape[0]/2-2))]
l = np.array(l)
df2 = pd.DataFrame(l, columns=['first', 'second','third','fourth','fifth','sixth'])
print(df2)

'''
Output:
  first second third fourth fifth sixth
0     1      2     4      5     7     8
1     4      5     7      8    10    11
2     7      8    10     11    13    14
3    10     11    13     14    16    17
4    13     14    16     17    19    20
'''

答案 2 :(得分:1)

一个更简单的解决方案是删除列“ C”。只需加入3个列表即可为df2创建一行。

代码如下:

df.drop(['C'] ,axis = 1 , inplace = True)

df2 = pd.DataFrame(columns=['first', 'second','third','fourth','fifth','sixth'])

for i in range(0,len(df.A) - 2):
    df2.loc[i] = list(df.loc[i]) + list(df.loc[i+1]) + list(df.loc[i+2])

print(df2)