我在 Dataframe 中有这样结构的数据...
D K Factor p
0 -0.483128 -1.240024 -1.214765 -1.002418
1 -0.692334 -1.632132 1.562630 0.997304
2 -1.189383 -1.632132 1.562630 0.997304
3 -1.691841 -1.632132 1.562630 0.997304
4 -2.084926 -1.632132 1.562630 0.997304
我正在尝试将数据重新组织成一个新结构,其中每一行都包含来自现有数据的“句点”行数。在现有中向前移动一行,然后堆叠下一个“句点行”。
到目前为止的功能:
def prepData(seq, period):
newStacks = pd.DataFrame()
for pos in range(0, len(seq) - (period+1), 1):
chunk = (seq[pos:pos + period])
stack = []
for sliver in range(0, len(chunk), 1):
piece = (chunk.iloc[sliver:])
print(piece)
stack.append(piece)
newStacks.append(chunk)
return newStacks
这显然效率不高,而且不会产生欲望结构。目的是考虑到 period = 3
0 -0.483128 -1.240024 -1.214765 -1.002418 -0.692334 -1.632132 1.562630 0.997304 -1.189383 -1.632132 1.562630 0.997304
1 -0.692334 -1.632132 1.562630 0.997304 -1.189383 -1.632132 1.562630 0.997304 -1.691841 -1.632132 1.562630 0.997304
2 -1.189383 -1.632132 1.562630 0.997304 -1.691841 -1.632132 1.562630 0.997304 -2.084926 -1.632132 1.562630 0.997304
AA 实现此目的的简单方法将不胜感激。
答案 0 :(得分:2)
我不确定您是要创建数据框还是列表。这是让你们两个的代码。
import pandas as pd
c = ['D','K','Factor','p']
d = [[-0.483128, -1.240024, -1.214765, -1.002418],
[-0.692334, -1.632132, 1.562630, 0.997304],
[-1.189383, -1.632132, 1.562630, 0.997304],
[-1.691841, -1.632132, 1.562630, 0.997304],
[-2.084926, -1.632132, 1.562630, 0.997304]]
df = pd.DataFrame(d,columns=c)
print (df)
p = 3 #this is the period you wanted. I set it to 3
stack_list = [] #this will store the final stacked list
#note: don't use stack, its used by pandas to stack
for i in range(len(df)-p+1): #iterating thru the dataframe
# convert p rows to a list after you stack them
chunk = df.loc[i:i+p-1].stack().reset_index(level=1,drop=True).tolist()
stack_list.append(chunk) #store chunk to stack_list
df1 = pd.DataFrame(stack_list) #creating a dataframe as per your request
#printing both stack_list and dataframe
print (stack_list)
print (df1)
输出结果为:
原始数据框为:
D K Factor p
0 -0.483128 -1.240024 -1.214765 -1.002418
1 -0.692334 -1.632132 1.562630 0.997304
2 -1.189383 -1.632132 1.562630 0.997304
3 -1.691841 -1.632132 1.562630 0.997304
4 -2.084926 -1.632132 1.562630 0.997304
堆叠列表为:
[[-0.483128, -1.240024, -1.214765, -1.002418, -0.692334, -1.632132, 1.56263, 0.997304, -1.189383, -1.632132, 1.56263, 0.997304],
[-0.692334, -1.632132, 1.56263, 0.997304, -1.189383, -1.632132, 1.56263, 0.997304, -1.691841, -1.632132, 1.56263, 0.997304],
[-1.189383, -1.632132, 1.56263, 0.997304, -1.691841, -1.632132, 1.56263, 0.997304, -2.084926, -1.632132, 1.56263, 0.997304]]
您想要创建的新数据框是:
0 1 2 ... 9 10 11
0 -0.483128 -1.240024 -1.214765 ... -1.632132 1.56263 0.997304
1 -0.692334 -1.632132 1.562630 ... -1.632132 1.56263 0.997304
2 -1.189383 -1.632132 1.562630 ... -1.632132 1.56263 0.997304
答案 1 :(得分:2)
如果有所有浮动列以提高性能,请在 numpy 中使用 strides
并将 3d array
重塑为 2d
并传递给 DataFrame
构造函数:
#https://stackoverflow.com/a/44306231/2901002 a bit changed
def strided_lastaxis(a, L):
s0,s1 = a.strides
m,n = a.shape
return np.lib.stride_tricks.as_strided(a, shape=(m-L+1,L,n), strides=(s0,s0,s1))
a = strided_lastaxis(df.to_numpy(), 3)
df1 = pd.DataFrame(a.reshape(a.shape[0], -1))
print (df1)
0 1 2 3 4 5 6 \
0 -0.483128 -1.240024 -1.214765 -1.002418 -0.692334 -1.632132 1.56263
1 -0.692334 -1.632132 1.562630 0.997304 -1.189383 -1.632132 1.56263
2 -1.189383 -1.632132 1.562630 0.997304 -1.691841 -1.632132 1.56263
7 8 9 10 11
0 0.997304 -1.189383 -1.632132 1.56263 0.997304
1 0.997304 -1.691841 -1.632132 1.56263 0.997304
2 0.997304 -2.084926 -1.632132 1.56263 0.997304