如何在列名与训练GLM的列名不同的data.frame上运行GLM?

时间:2019-05-02 09:13:19

标签: r glm names

我有一个传统的GLM模型,该模型在多年前就已经接受了天气数据的培训。现代天气数据的列名与原始数据不同。

上下文:我正在根据天气预测事件。

列名更改示例:较新的data.frame具有不同的列名。在原始data.frame(已安装GLM /对其进行了训练)上,有一列称为“ rainfall”,在新的data.frame中,它称为“ RAIN_M40”。

问题:是否可以使用现有的GLM,而无需将新列的名称(“ RAIN_M40”)更改为旧名称(“ rainfall”)”?

# 1) Generate some random example data

original_data <- data.frame(events = rbinom(500, 1, 0.1), rainfall = runif(500))

current_data <- data.frame(RAIN_M40 = original_data$rainfall)



# 2) build the model on the original data

model_name <- glm(events ~ rainfall, family = poisson(), data = original_data)



# 3) predict the model using the original data (this works fine)


original_data$predicted_events <- predict((model_name), newdata = original_data, type = "response")



# 3) predict the model using the current data.frame  (does not work)

current_data$predicted_events <- predict((model_name), newdata = current_data, type = "response")

# returns the following error as column "rainfall" is now called "RAIN_M40": Error in eval(predvars, data, env) : object 'rainfall' not found








## CURRENT WORK AROUND ---------------


# 4) duplicating the column, giving it the new name so the model runs (but I don't want to have to do this!)

current_data$rainfall <- current_data$RAIN_M40




# 5) predict the model using the current data.frame  (this works fine, but the data is twice as big)

current_data$predicted_events <- predict((model_name), newdata = current_data, type = "response")


# ===================
 # My brother's solution:

library(data.table)
library(magrittr)

fr1 <- data.table(x = 1:3, y = 1:3)
fr2 <- data.table(X = 1:3, Y = 1:3)

fit <- lm(y ~ x, data = fr1)

predict(fit, newdata = within(fr2, x <- X))

predict(fit, newdata = copy(fr2) %>% setnames("X", "x"))

# =========================

当前解决方案:我当前的解决方法是复制列,以便为每个名称包括一个列,但这很丑陋,并且由于数据非常大,效率低下。出于报告原因,不允许更改/重新训练模型。

如果您有任何建议,我将不胜感激。确实非常感谢。

编辑:感谢我的兄弟,最后提供了多种解决方法。有些使用data.table。我尚未对它们进行速度等方面的测试,但此处将它们作为其他有这些问题的参考!

0 个答案:

没有答案