我有一个数据集。我的任务是应用不同的分类器并预测新数据的类。
我的问题是如何预测信用评级
每个变量名称旁边的值是该分数 人应该反对那些特定的变量。
我的意思是我知道如何训练和测试整个数据集。例如,我会做这样的事情来预测使用决策树:
series
现在我如何预测客户的信用评级? 任何建议都将受到高度赞赏。提前谢谢。
答案 0 :(得分:1)
预测中位数单位的方法是创建一个新数据框,其单位具有所有变量的中值,并将其提供给predict()
。线性回归的一个例子是:
set.seed(2018)
## Let's make some example data.
df <- data.frame(
x1 = rnorm(1000, 1),
x2 = rnorm(1000),
x3 = rnorm(1000, -1)
)
df$y = .4 * df$x1 + -.2 * df$x2 + .1 * df$x3 + rnorm(1000)
## ... and fit a simple linear model.
fit <- lm(y ~ x1 + x2 + x3, data = df)
summary(fit)
#> Call:
#> lm(formula = y ~ x1 + x2 + x3, data = df)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -3.13203 -0.66952 -0.05941 0.67924 2.85789
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) -0.10350 0.05593 -1.850 0.0646 .
#> x1 0.43968 0.03123 14.077 < 2e-16 ***
#> x2 -0.18725 0.03179 -5.891 5.26e-09 ***
#> x3 0.01585 0.03219 0.492 0.6226
#> ---
#> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#>
#> Residual standard error: 1.001 on 996 degrees of freedom
#> Multiple R-squared: 0.1914, Adjusted R-squared: 0.189
#> F-statistic: 78.6 on 3 and 996 DF, p-value: < 2.2e-16
## To get the median unit, just make a unit which as median value on
## each variable.
new_data <- data.frame(
x1 = median(df$x1),
x2 = median(df$x2),
x3 = median(df$x3)
)
## You can also do this much more efficiently. Here is an example if
## all your variables are numeric.
new_data <- as.data.frame(lapply(df, median))
## Give this new data frame to `predict()` to predict y for the median
## unit.
predict(fit, newdata = new_data)
#> 1
#> 0.3407412
## Lets compare to the mean of y.
mean(df$y)
#> [1] 0.3295454