我是机械学习(不是数学家)的新手,也是从视频和书籍中学习ML的人。我对朴素贝叶斯,svm,决策树等算法有基本的了解,我使用ML来模拟股市的日常回报。我想对我的ML使用非线性回归算法,因此选择支持向量机回归,因为它很受欢迎。我使用交易日和EMA差异作为特征向量(X)和价格变化作为标签(Y)。以下是我的代码
library("quantmod")
#Adding libraries
library("lubridate")
#Makes it easier to work with the dates
library("e1071")
#Gives us access to the svm
stockData <- new.env()
tickers <- 'AAPL'
startDate = as.Date("2015-11-01")
# The beginning of the date range we want to look at
symbol = getSymbols(tickers,from=startDate, auto.assign=F)
# Retrieving Apple’s daily OHLCV from Yahoo Finance
DayofWeek<-wday(symbol, label=TRUE)
#Find the day of the week
Class<- Cl(symbol) - Op(symbol)
#price change
EMA5<-EMA(Cl(symbol),n = 5)
#We are calculating a 5-period EMA off the open price
EMA10<-EMA(Cl(symbol),n = 10)
#Then the 10-period EMA, also off the open price
EMACross <- EMA5 - EMA10
#Positive values correspond to the 5-period EMA being above the 10-period EMA
EMACross<-round(EMACross,2)
DataSet2<-data.frame(DayofWeek,EMACross, Class)
DataSet2<-DataSet2[-c(1:10),]
#We need to remove the instances where the 10-period moving average is still being calculated
m<-nrow(DataSet2)
n<-round((nrow(DataSet2)*2)/3)
TrainingSet<-DataSet2[1:n,]
#We will use ⅔ of the data to train the model
TestSet<-DataSet2[(n+1):m,]
#And ⅓ to test it on unseen data
EMACrossModel<-svm( Cl(symbol) ~ ., data=TrainingSet)
summary(EMACrossModel)
pred<-predict(EMACrossModel,TestSet[,-3])
当我运行上面的代码时,我收到此错误
> EMACrossModel<-svm( Cl(symbol) ~ ., data=TrainingSet)
Error in model.frame.default(formula = Cl(symbol) ~ ., data = TrainingSet, :
variable lengths differ (found for 'DayofWeek')
所以我的问题是(原谅我,但我有不止一个问题)
1) How to solve my above problem?
2) Can in use both qualitative (eg: mon,tue,wed etc) and quantitative(eg 1.0,0.1,100 etc) data together in SVM regressions
3) How can i plot my above results with SVM decision
boundaries?
EDITED
DataSet2
DayofWeek EMA AAPL.Close
2015-11-16 Mon -2.77 2.800003
2015-11-17 Tues -2.51 -1.229996
2015-11-18 Wed -1.67 1.529999
2015-11-19 Thurs -0.89 1.140000
2015-11-20 Fri -0.32 0.100006
2015-11-23 Mon -0.23 -1.519997
2015-11-24 Tues 0.00 1.549995
2015-11-25 Wed 0.00 -1.180000
2015-11-27 Fri -0.03 -0.480003
2015-11-30 Mon 0.02 0.310005
2015-12-01 Tues -0.09 -1.410004
2015-12-02 Wed -0.31 -1.059997
2015-12-03 Thurs -0.57 -1.350006
2015-12-04 Fri -0.10 3.739998
2015-12-07 Mon 0.05 -0.700004
2015-12-08 Tues 0.12 0.710006
2015-12-09 Wed -0.24 -2.019996
2015-12-10 Thurs -0.35 0.129997
2015-12-11 Fri -0.83 -2.010002
2015-12-14 Mon -1.15 0.300003
2015-12-15 Tues -1.56 -1.450004
2015-12-16 Wed -1.56 0.269996
2015-12-17 Thurs -1.82 -3.039994
2015-12-18 Fri -2.30 -2.880005
2015-12-21 Mon -2.23 0.050003
2015-12-22 Tues -2.07 -0.169999
2015-12-23 Wed -1.64 1.340004
2015-12-24 Thurs -1.40 -0.970001
2015-12-28 Mon -1.37 -0.769996
2015-12-29 Tues -0.98 1.779999
2015-12-30 Wed -0.92 -1.260002
修改后的下面的代码运行但给出了不同的答案
这些是修改
EMACrossModel<-ksvm( Cl(symbol[1:n]) ~ ., data=TrainingSet,kernel="rbfdot",C=10) #kernlab libraries
pred<-predict(EMACrossModel,TestSet)
结果
> EMACrossModel
Support Vector Machine object of class "ksvm"
SV type: eps-svr (regression)
parameter : epsilon = 0.1 cost C = 10
Gaussian Radial Basis kernel function.
Hyperparameter : sigma = 0.294836572886287
Number of Support Vectors : 17
Objective Function Value : -49.1082
Training error : 0.138329
> pred
[,1]
[1,] 119.7267
[2,] 119.9733
[3,] 120.7236
[4,] 121.8324
[5,] 121.5632
[6,] 121.4652
[7,] 119.6438
[8,] 119.6962
[9,] 119.0775
[10,] 116.4956
我除了预测结果是这样的
[,1]
-1.327996
1.229939
-1.130000
0.100006
-1.519997
-0.480003
1.310005
-1.410004
-1.059997
1.350006
-2.739998
1.700004
我的猜测是我当前的代码将股票价格而不是价格变化视为Y并使用它来模拟 EMACrossModel 。我对吗?如果是,我怎么能解决这个问题。
答案 0 :(得分:2)
关于问题一 您通过删除一些数据来形成您的Trainingset。但是,您没有限制符号集:
EMACrossModel<-svm( Cl(symbol[1:n]) ~ ., data=TrainingSet)
我只是意识到你更想要的是:
EMACrossModel<-svm( AAPL.Close ~ ., data=TrainingSet)
一般来说,公式如下: Cl(符号[1:n])〜。 定义学到的东西。目前它是“符号”。但是,我假设您要预测AAPL.Close列。 公式是R(https://stat.ethz.ch/R-manual/R-devel/library/stats/html/formula.html)中的一般概念。值得投入一点时间来理解这些。 的修改 根据您的上述评论,这似乎得到了证实。结果是
-0.1926745
0.3578645
0.1830046
0.6362871
-0.3760084
-0.1443156
0.2615674
0.2589130
-0.4779677
-0.5928780
结束编辑
关于问题二,它取决于实现(和内核),但似乎就是这种情况。
关于你的第三个问题。 E1071包中包含一个示例:
data(cats, package = "MASS")
m <- svm(Sex~., data = cats)
plot(m, cats)
修改强> 我刚刚意识到这个绘图函数仅适用于分类器,但不适用于回归。但是,您可以轻松构建自己的绘图功能。为简单起见,我首先将星期几转换为数字。
DataSet2$DayofWeek <- as.numeric(DataSet2$DayofWeek)
并重建分类器 之后,您可以通过
显示分类器### plot the results of the support vector machine by
# first generating a grid covering the data range
#generate a sequence of 100 numbers between the minimum and maximum of DataSet2EMA
plot.ema.vec <- seq(min(DataSet2$EMA),max(DataSet2$EMA),(max(DataSet2$EMA)-min(DataSet2$EMA))/100)
#generate a "grid" of artificial data points 1:7 are the weekdays
# can be replaced by c("Mon",...,"Sun")
datagrid <- expand.grid(1:7,plot.ema.vec)
# set the names of the grid according to the dataset s.t. the classifier can use the data as input
names(datagrid) <- names(DataSet2[,1:2])
#calculate the predictions of the classifier
grid.pred <- predict(EMACrossModel,datagrid)
# normalise the prediction in [0,1] range to use it as colors
cols <- (grid.pred-min(grid.pred))/(max(grid.pred)-min(grid.pred))
# plot the decisions for the data
plot(datagrid$DayofWeek,datagrid$EMA , col=rgb(blue=cols,red=1-cols,green=0))