假设我们有一个像这样的数据框
df = pd.DataFrame(columns=['A', 'B','C'])
df.loc[0]=[1,2,3]
df.loc[1]=[4,5,6]
df.loc[2]=[7,8,9]
df.loc[3]=[10,11,12]
df.loc[4]=[13,14,15]
df.loc[5]=[16,17,18]
df.loc[6]=[19,20,21]
df
A B C
0 1 2 3
1 4 5 6
2 7 8 9
3 10 11 12
4 13 14 15
5 16 17 18
6 19 20 21
我想修改df以获得df2;
df2 = pd.DataFrame(columns=['first', 'second','third','fourth','fifth','sixth'])
df2.loc[0]=[1,2,4,5,7,8]
df2.loc[1]=[4,5,7,8,10,11]
df2.loc[2]=[7,8,10,11,13,14]
df2.loc[3]=[10,11,13,14,16,17]
df2.loc[4]=[13,14,16,17,19,20]
df2
first second third fourth fifth sixth
0 1 2 4 5 7 8
1 4 5 7 8 10 11
2 7 8 10 11 13 14
3 10 11 13 14 16 17
4 13 14 16 17 19 20
也就是说,我要用df的前两列的三行填充df2的第一行。 然后,我们用df两列的下三行填充df2的第二行,依此类推。
从df转到df2应该怎么做? 我可以做一些简单的基本操作。 但是现在对我来说仍然很难。
有人可以帮助我吗?
答案 0 :(得分:1)
您可以使用strides通过ravel将前两列转换为1d数组,也可以通过索引[::2]
来选择每一对行
def rolling_window(a, window):
shape = a.shape[:-1] + (a.shape[-1] - window + 1, window)
strides = a.strides + (a.strides[-1],)
return np.lib.stride_tricks.as_strided(a, shape=shape, strides=strides)
a = rolling_window(df[['A','B']].to_numpy().ravel(), 6)[::2]
print (a)
[[1 2 4 5 7 8]
[4 5 7 8 10 11]
[7 8 10 11 13 14]
[10 11 13 14 16 17]
[13 14 16 17 19 20]]
df2 = pd.DataFrame(a, columns=['first', 'second','third','fourth','fifth','sixth'])
print (df2)
first second third fourth fifth sixth
0 1 2 4 5 7 8
1 4 5 7 8 10 11
2 7 8 10 11 13 14
3 10 11 13 14 16 17
4 13 14 16 17 19 20
答案 1 :(得分:1)
将NumPy用作:
import numpy as np
new = df.values[:, :2].reshape(-1)
l = [new[2*i:2*i+6] for i in range(int(new.shape[0]/2-2))]
l = np.array(l)
df2 = pd.DataFrame(l, columns=['first', 'second','third','fourth','fifth','sixth'])
print(df2)
'''
Output:
first second third fourth fifth sixth
0 1 2 4 5 7 8
1 4 5 7 8 10 11
2 7 8 10 11 13 14
3 10 11 13 14 16 17
4 13 14 16 17 19 20
'''
答案 2 :(得分:1)
一个更简单的解决方案是删除列“ C”。只需加入3个列表即可为df2创建一行。
代码如下:
df.drop(['C'] ,axis = 1 , inplace = True)
df2 = pd.DataFrame(columns=['first', 'second','third','fourth','fifth','sixth'])
for i in range(0,len(df.A) - 2):
df2.loc[i] = list(df.loc[i]) + list(df.loc[i+1]) + list(df.loc[i+2])
print(df2)