我有一个传统的GLM模型,该模型在多年前就已经接受了天气数据的培训。现代天气数据的列名与原始数据不同。
上下文:我正在根据天气预测事件。
列名更改示例:较新的data.frame具有不同的列名。在原始data.frame(已安装GLM /对其进行了训练)上,有一列称为“ rainfall”,在新的data.frame中,它称为“ RAIN_M40”。
问题:是否可以使用现有的GLM,而无需将新列的名称(“ RAIN_M40”)更改为旧名称(“ rainfall”)”?
# 1) Generate some random example data
original_data <- data.frame(events = rbinom(500, 1, 0.1), rainfall = runif(500))
current_data <- data.frame(RAIN_M40 = original_data$rainfall)
# 2) build the model on the original data
model_name <- glm(events ~ rainfall, family = poisson(), data = original_data)
# 3) predict the model using the original data (this works fine)
original_data$predicted_events <- predict((model_name), newdata = original_data, type = "response")
# 3) predict the model using the current data.frame (does not work)
current_data$predicted_events <- predict((model_name), newdata = current_data, type = "response")
# returns the following error as column "rainfall" is now called "RAIN_M40": Error in eval(predvars, data, env) : object 'rainfall' not found
## CURRENT WORK AROUND ---------------
# 4) duplicating the column, giving it the new name so the model runs (but I don't want to have to do this!)
current_data$rainfall <- current_data$RAIN_M40
# 5) predict the model using the current data.frame (this works fine, but the data is twice as big)
current_data$predicted_events <- predict((model_name), newdata = current_data, type = "response")
# ===================
# My brother's solution:
library(data.table)
library(magrittr)
fr1 <- data.table(x = 1:3, y = 1:3)
fr2 <- data.table(X = 1:3, Y = 1:3)
fit <- lm(y ~ x, data = fr1)
predict(fit, newdata = within(fr2, x <- X))
predict(fit, newdata = copy(fr2) %>% setnames("X", "x"))
# =========================
当前解决方案:我当前的解决方法是复制列,以便为每个名称包括一个列,但这很丑陋,并且由于数据非常大,效率低下。出于报告原因,不允许更改/重新训练模型。
如果您有任何建议,我将不胜感激。确实非常感谢。
编辑:感谢我的兄弟,最后提供了多种解决方法。有些使用data.table。我尚未对它们进行速度等方面的测试,但此处将它们作为其他有这些问题的参考!