我想通过移动数据将时间序列数据分为X和y。虚拟数据框如下所示:
即如果时间步长等于2,则X和y类似于:X = [3,0]-> y = [5]
X = [0,5]-> y = [7](这应该应用于整个样本(行))
我在下面编写了函数,但是当我将pandas数据帧传递给函数时,它将返回空矩阵。
def create_dataset(dataset, time_step=1):
dataX, dataY = [], []
for i in range (len(dataset)-time_step-1):
a = dataset.iloc[:,i:(i+time_step)]
dataX.append(a)
dataY.append(dataset.iloc[:, i + time_step ])
return np.array(dataX), np.array(dataY)
谢谢您的解决方案。
答案 0 :(得分:1)
以下是一个复制该示例IIUC的示例:
import pandas as pd
# function to process each row
def process_row(s):
assert isinstance(s, pd.Series)
return pd.concat([
s.rename('timestep'),
s.shift(-1).rename('x_1'),
s.shift(-2).rename('x_2'),
s.shift(-3).rename('y')
], axis=1).dropna(how='any', axis=0).astype(int)
# test case for the example
process_row( pd.Series([2, 3, 0, 5, 6]) )
# type in first two rows of the data frame
df = pd.DataFrame(
{'x-2': [3, 2], 'x-1': [0, 3],
'x0': [5, 0], 'x1': [7, 5], 'x2': [1, 6]})
# perform the transformation
ts = list()
for idx, row in df.iterrows():
t = process_row(row)
t.index = [idx] * t.index.size
ts.append(t)
print(pd.concat(ts))
# results
timestep x_1 x_2 y
0 3 0 5 7
0 0 5 7 1
1 2 3 0 5 <-- first part of expected results
1 3 0 5 6 <-- second part
答案 1 :(得分:0)
您的意思是这样的吗?
df = df.shift(periods=-2, axis='columns')
# you can also pass a fill values parameter
df = df.shift(periods=-2, axis='columns', fill_value = 0)