Question

我有一个数据集。我的任务是应用不同的分类器并预测新数据的类。

我的问题是如何预测信用评级

每个变量名称旁边的值是该分数人应该反对那些特定的变量。

我的意思是我知道如何训练和测试整个数据集。例如，我会做这样的事情来预测使用决策树：

series

现在我如何预测客户的信用评级？任何建议都将受到高度赞赏。提前谢谢。

Answer 1

预测中位数单位的方法是创建一个新数据框，其单位具有所有变量的中值，并将其提供给predict()。线性回归的一个例子是：

set.seed(2018)

## Let's make some example data.
df <- data.frame(
  x1 = rnorm(1000, 1),
  x2 = rnorm(1000),
  x3 = rnorm(1000, -1)
)
df$y = .4 * df$x1 + -.2 * df$x2 + .1 * df$x3 + rnorm(1000)

## ... and fit a simple linear model.
fit <- lm(y ~ x1 + x2 + x3, data = df)
summary(fit)

#> Call:
#> lm(formula = y ~ x1 + x2 + x3, data = df)
#> 
#> Residuals:
#>      Min       1Q   Median       3Q      Max 
#> -3.13203 -0.66952 -0.05941  0.67924  2.85789 
#> 
#> Coefficients:
#>             Estimate Std. Error t value Pr(>|t|)    
#> (Intercept) -0.10350    0.05593  -1.850   0.0646 .  
#> x1           0.43968    0.03123  14.077  < 2e-16 ***
#> x2          -0.18725    0.03179  -5.891 5.26e-09 ***
#> x3           0.01585    0.03219   0.492   0.6226    
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#> 
#> Residual standard error: 1.001 on 996 degrees of freedom
#> Multiple R-squared:  0.1914, Adjusted R-squared:  0.189 
#> F-statistic:  78.6 on 3 and 996 DF,  p-value: < 2.2e-16


## To get the median unit, just make a unit which as median value on
## each variable.
new_data <- data.frame(
  x1 = median(df$x1),
  x2 = median(df$x2),
  x3 = median(df$x3)
)

## You can also do this much more efficiently. Here is an example if
## all your variables are numeric.
new_data <- as.data.frame(lapply(df, median))

## Give this new data frame to `predict()` to predict y for the median
## unit.
predict(fit, newdata = new_data)

#>        1 
#> 0.3407412 


## Lets compare to the mean of y.
mean(df$y)

#> [1] 0.3295454

如何在R中的给定数据集中添加新行并预测类？我做分类还是回归？

1 个答案: