Question

R predict.lm函数给出错误大小的输出。

stocks = read.csv("some-file.csv", header = TRUE)

## 75% of the sample size
smp_size <- floor(0.75 * nrow(stocks))

## set the seed to make your partition reproductible
set.seed(123)
train_ind <- sample(seq_len(nrow(stocks)), size = smp_size)

train <- stocks[train_ind, ]
test <- stocks[-train_ind, ]

model = lm ( train$Open ~ train$Close, data=train)
model
predicted<-predict.lm(model, test$Open)
length(test$Open)
length(predicted)
length(test$Close)

> length(test$Open)
[1] 16994
> length(predicted)
[1] 50867
> length(test$Close)
[1] 16994

为什么会这样？预测函数的输出长度应该等于测试$ Open的长度，对吧？

Answer 1

我无法准确说明lm将如何解释您的#!/usr/bin/env python import re with open('testfile') as infile, open('testout', 'w') as outfile: for line in infile: line = line.replace("\\","") line = line.replace("-","") line = line.replace(" ","") line = line.replace("+","") line = line.replace("|","") line = re.sub(r'(:[0-9.+]+)>', r':', line) line = re.sub(r':>', r':', line) outfile.write(line)和org.projectlombok:lombok:1.16.6 org.springframework:springaspects:4.1.7.RELEASE org.aspectj:aspectjweaver:1.8.6 org.springframework.boot:springbootstarterweb:1.2.6.RELEASE org.springframework.boot:springbootstarter:1.2.6.RELEASE org.springframework.boot:springboot:1.2.6.RELEASE org.springframework:springcore:4.1.7.RELEASE org.springframework:springcontext:4.1.7.RELEASE，但我可以说您的train$Open是您的问题。所以，我可以告诉你lm从哪里获取你的数据以及为什么它不是你的火车组的长度。你想要train$Close

Answer 2

问题在于predicted<-predict.lm(model, test$Open)它应该是

 predicted<-predict.lm(model, test)

无论如何在

中的predict.lm中删除了响应

 line 15:       Terms <- delete.response(tt)

实际上，无论如何都应该为你的模型测试$ Close。

你得到的是训练集的结果，因为你根本没有提供任何数据（在代码删除了响应之后）。使用iris的示例

train_ind <- sample(seq_len(nrow(iris)),size=100)
train <- iris[train_ind,]
test <- iris[-train_ind,]
model=lm(Sepal.Length ~Sepal.Width,data=train)
model
predicted1 <-predict.lm(model,test)
length(predicted)
#fake response to keep dataframe structure
predicted2 <-predict.lm(model, predict.lm(model,data.frame(Sepal.Width=test$Sepal.Width))
length(predicted2)
predicted1-predicted2

最后几行的输出

length(predicted)
[1] 50
> predicted2 <- predict.lm(model,data.frame(Sepal.Width=test$Sepal.Width)
> length(predicted2)
[1] 50
> predicted1-predicted2
  4   5   9  10  12  17  19  25  26  32  33  36  37  40  41  47  49  53  61  67  68  69  74  76  78  79  81  83  84  85  87 
  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0 
 92  94  98 105 110 112 113 114 122 125 127 128 132 133 137 140 141 142 145 
  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0

如何修复R predict.lm错误的输出长度？

2 个答案: