我有一系列点(x,y)数据,并且我想使用三个点的滚动窗口。我想对每个窗口应用一个功能,基本上是映射滚动窗口。如何在numpy中执行此操作?
答案 0 :(得分:1)
我认为您可以执行此类操作的最快方法是,使数组的三个副本全部相对于彼此偏移一个。例如:
In [1]: a = np.arange(12)
In [2]: a
Out[2]: array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11])
In [3]: np.vstack((a,np.roll(a,-1),np.roll(a,-2))).T[:-2]
Out[3]:
array([[ 0, 1, 2],
[ 1, 2, 3],
[ 2, 3, 4],
[ 3, 4, 5],
[ 4, 5, 6],
[ 5, 6, 7],
[ 6, 7, 8],
[ 7, 8, 9],
[ 8, 9, 10],
[ 9, 10, 11]])
,然后可以使用功能在最后一个轴上进行操作。例如,要计算滚动总和:
def window_function(a):
return np.sum(a,axis=-1)
>>> a = np.arange(12)
>>> map(window_function,[a[i:i+3] for i in range(len(a)-2)])
[3, 6, 9, 12, 15, 18, 21, 24, 27, 30]
>>> window_function(np.vstack((a,np.roll(a,-1),np.roll(a,-2))).T[:-2])
array([ 3, 6, 9, 12, 15, 18, 21, 24, 27, 30])
这可以通过一个函数来概括:
def get_rolling_window(a,size):
return np.vstack(np.roll(a,-i) for i in range(size)).T[:-size+1]
答案 1 :(得分:0)
numpy
中有一种称为n=3
np.convolve(a, np.ones((n,)), mode='valid')
array([ 3., 6., 9., 12., 15., 18., 21., 24., 27., 30.])
的方法,例如,如果您需要求和
df = pd.read_csv("train.csv", header=0)
df = df[["PassengerId", "Survived", "Sex", "Age", "Embarked"]]
df.dropna(inplace=True)
X = df[["Sex", "Age"]]
X_train = np.array(X)
Y = df["Survived"]
Y_train = np.array(Y)
clf = LogisticRegression()
clf.fit(X_train, Y_train)
df1 = pd.read_csv("test.csv", header=0)
df1 = df1[["PassengerId", "Survived", "Sex", "Age", "Embarked"]]
df1.dropna(inplace=True)
X = df1[["Sex", "Age"]]
X_test = np.array(X)
Y = df1["Survived"]
Y_test = np.array(Y)
X_test = X_test.astype(float)
Y_test = Y_test.astype(float)
#to convert string data to float
accuracy = clf.score(X_test, Y_test)
print("Accuracy = ", accuracy)