刚进入机器学习。我玩了一些数据集,现在我正试图解决一些股票数据:)我无法预测多个股票价值。
举一个简单的例子,让我说在过去的五年中我拉动了5只股票的每日高点,低点,成交量,开盘价和收盘价。每行都是一个日期,每列都是一个特定的股票变量。即AAPL.High,ADI.High等。目标是明天哪些股票的交易量会增加。
在下面的代码中,我相信结果是每天为每个符号创建一个结果。因此,对于每一天,我应该有五个结果。但是当你进行预测时,你每天只得到1套是和否。
我也不确定它是否正在寻找符号之间的连接,但这就是我想要它做的事情。它可以发现,如果库存量x上升,库存量也会一直上升。
摘要:
如何在预测时获得每天五个单独的结果(每个股票代码的结果)?
所有符号的数据是一起使用的,还是独立查看每个符号? (我想要前者)。
下面是完整的代码,如果你把整个东西放在r。
中它应该运行得很好非常感谢任何帮助和见解!
library(quantmod)
# get market data for these stock symbols
Nasdaq100_Symbols <- c("AAPL", "ADBE", "ADI", "ADP", "ADSK")
getSymbols(Nasdaq100_Symbols)
# merge them all together
nasdaq100 <- data.frame(as.xts(merge(AAPL, ADBE, ADI, ADP, ADSK)))
# set outcome variable
outcomeSymbol <- 'ADI.Volume'
# shift outcome value to be on same line as predictors
library(xts)
nasdaq100 <- xts(nasdaq100,order.by=as.Date(rownames(nasdaq100)))
nasdaq100 <- as.data.frame(merge(nasdaq100, lm1=lag(nasdaq100[,outcomeSymbol],-1)))
nasdaq100$outcome <- ifelse(nasdaq100[,paste0(outcomeSymbol,'.1')] > nasdaq100[,outcomeSymbol], 1, 0)
# remove shifted down volume field as we don't care by the value
nasdaq100 <- nasdaq100[,!names(nasdaq100) %in% c(paste0(outcomeSymbol,'.1'))]
# cast date to true date and order in decreasing order
nasdaq100$date <- as.Date(row.names(nasdaq100))
nasdaq100 <- nasdaq100[order(as.Date(nasdaq100$date, "%m/%d/%Y"), decreasing = TRUE),]
# drop most recent entry as we don't have an outcome
nasdaq100 <- nasdaq100[2:nrow(nasdaq100),]
# remove date field and shuffle data frame
nasdaq100 <- subset(nasdaq100, select=-c(date))
nasdaq100 <- nasdaq100[sample(nrow(nasdaq100)),]
# model training
library(caret)
predictorNames <- names(nasdaq100)[names(nasdaq100) != 'outcome']
set.seed(1234)
split <- sample(nrow(nasdaq100), floor(0.7*nrow(nasdaq100)))
train <-nasdaq100[split,]
test <- nasdaq100[-split,]
train$outcome <- ifelse(train$outcome==1,'yes','nope')
# create caret trainControl object to control the number of cross-validations performed
objControl <- trainControl(method='cv', number=5, returnResamp='none', summaryFunction = twoClassSummary, classProbs = TRUE)
# run model
bst <- train(train[,predictorNames], as.factor(train$outcome),
# data = nasdaq100, # Use the trainSet dataframe as the training data
method='gbm',
trControl=objControl,
metric = "ROC",
tuneGrid = expand.grid(n.trees = 50, interaction.depth = 3, shrinkage = 0.1, n.minobsinnode = 10)
)
predictions <- predict(object=bst, test[,predictorNames], type='prob')
library(pROC)
auc <- auc(test$outcome,predictions[[2]])
print(paste('AUC score:', auc))