Question

我有一个data.frame，有24个条目和4列 - 其中3个是值，第四个是类别，1或2.

我有另外4个元素，我知道3个值，但不是类别，我希望我的程序根据我的data.frame

尝试将它们放入其中一个类别中

我该怎么做？我已经读过关于函数和预测的内容，我认为这可能会有所帮助，但我无法将我的data.frame整合到

中

Answer 1

我假设您想在R中执行此操作。

假设您的前3列可以预测第4列（即，观察类别与这3列中的一个或多个相关的某种模式），您应该能够使用多重逻辑回归之类的东西根据前3列预测第4列。如果不了解您的数据，就很难给出确切的答案，因此这将是获得您正在寻找的内容的简单方法。

首先，我会尝试调查glm（）函数来拟合逻辑模型，例如：

fit <- glm(4thcolumnname ~ column1 + column2 + column3, data = data.frame, family = "binomial")

有了这个，你可以使用Anova和or_glm函数来获得模型的一个很好的总结：

library(oddsratio)
library(car)
Anova(fit) #shows which variables (columns) are significant
or_glm(data = data.frame, model = fit, incr = list(column1 = 1)) #shows odds ratios for each variable, which can make interpretation easy. The incr argument depends on your variables, so you'll need to read the documentation of or_glm for that

为了完善您的模型，我建议您阅读机器学习（https://www.datacamp.com/community/tutorials/machine-learning-in-r可能是一个很好的资源）。

拥有一个好模型后，您可以使用预测功能来预测新数据。在这种情况下，我建议使用您的新数据创建一个新的数据框，并在predict.glm（）函数中使用它：

new.data = data.frame[25:28,1:3] #subsets your data frame to get only the unknown data
predict.glm(fit, newdata = new.data, type = "response") #will give you the probability of each new data point being either 1 or 0

希望有所帮助。如果您想要更详细的答案，请提供一些示例代码，以显示您已尝试过的内容以及数据的性质。

预测数据框中新记录的值

1 个答案: