我有一个名为bias_correc的数据框,观察结果为x列,预测值为y列。下面显示的数据框包含不同区域的观测和预测,因此列的名称相似,我想在每个时间步计算每个位置(即海岸)的预测偏差。
我知道如何通过每组位置列创建一个简单减法的新列来手动执行此操作
bias_correc$Coast <- bias_correc$Coast.y- bias_correct$Coast.x
但我希望通过应用函数或循环执行此操作(如果可能),以便计算每组位置列并将其转储到此数据框或新数据框中。
我熟悉seq函数并且过去曾使用它,但我不确定如何将它包装到apply函数或循环中,以便计算每个列的位置差异。
非常感谢任何帮助。
bias_correc <-
structure(list(Forecast_day = c(8, 8, 8, 8, 8, 8), Forecast_date = structure(c(17555,
17556, 17557, 17558, 17559, 17560), class = "Date"), DeliveryDate = structure(c(17563,
17564, 17565, 17566, 17567, 17568), class = "Date"), HourEnding = c(1L,
1L, 1L, 1L, 1L, 1L), Coast.x = c(60.8, 62.6, 50.5, 56.8, 58.9,
59.4), Coast.y = c(58.5, 51, 46.7, 49.7, 49.3, 48.2), East.x = c(56,
52, 43, 47, 43.5, 52.5), East.y = c(56.5, 43.5, 41.5, 43.5, 43,
43), FarWest.x = c(50, 41, 45.5, 49.5, 35.5, 49.5), FarWest.y = c(46.5,
34.5, 36.5, 38, 41.5, 39), North.x = c(49, 34.5, 34.5, 39.5,
24.5, 34.5), North.y = c(49.5, 32, 33, 38, 38.5, 34.5), NorthCentral.x = c(57.5,
44.75, 45.5, 52.75, 35.75, 38.5), NorthCentral.y = c(54, 37.5,
39.75, 42, 42.5, 40), SouthCentral.x = c(56.5, 53.5, 51.5, 48.5,
53.5, 56), SouthCentral.y = c(56, 43.5, 43, 45, 45, 45), Southern.x = c(60.4,
63.6, 55, 61.8, 64, 65.6), Southern.y = c(58.4, 52.8, 50.4, 54,
54.4, 53.6), West.x = c(57.6, 42, 43.4, 51.8, 32.6, 45.2), West.y = c(49.6,
34.6, 36.8, 38.6, 40.4, 36.2)), class = "data.frame", row.names = c(NA,
-6L), .Names = c("Forecast_day", "Forecast_date", "DeliveryDate",
"HourEnding", "Coast.x", "Coast.y", "East.x", "East.y", "FarWest.x",
"FarWest.y", "North.x", "North.y", "NorthCentral.x", "NorthCentral.y",
"SouthCentral.x", "SouthCentral.y", "Southern.x", "Southern.y",
"West.x", "West.y"))
答案 0 :(得分:2)
如果我们对列名做一些字符串操作,那应该相当简单。
# find column names ending in ".x"
var_names <- names(bias_correc)[grepl(pattern = ".x",
x = names(bias_correc),
fixed = TRUE)]
# replace ".x" with "" (blank)
var_names <- gsub(pattern = ".x", replacement = "", x = var_names, fixed = TRUE)
# subtract y and x
(diff_table <- bias_correc[paste0(var_names, ".y")] - bias_correc[paste0(var_names, ".x")])
Coast.y East.y FarWest.y North.y NorthCentral.y SouthCentral.y Southern.y West.y
1 -2.3 0.5 -3.5 0.5 -3.50 -0.5 -2.0 -8.0
2 -11.6 -8.5 -6.5 -2.5 -7.25 -10.0 -10.8 -7.4
3 -3.8 -1.5 -9.0 -1.5 -5.75 -8.5 -4.6 -6.6
4 -7.1 -3.5 -11.5 -1.5 -10.75 -3.5 -7.8 -13.2
5 -9.6 -0.5 6.0 14.0 6.75 -8.5 -9.6 7.8
6 -11.2 -9.5 -10.5 0.0 1.50 -11.0 -12.0 -9.0
cbind(bias_correc, setNames(diff_table, var_names)) # bind back to original table
答案 1 :(得分:2)
您可以使用grep
提取特定列,并使用apply
系列迭代data.frame执行减法。
代码:
cNamesID <- grep("\\.[x|y]$", colnames(bias_correc))
cNames <- unique(sub("\\.[x|y]$", "", colnames(bias_correc)[cNamesID]))
cbind(bias_correc[, -cNamesID],
sapply(cNames, function(x)
bias_correc[, paste0(x, ".y")] - bias_correc[, paste0(x, ".x")]))
结果:
Forecast_day Forecast_date DeliveryDate HourEnding Coast East FarWest North NorthCentral SouthCentral Southern West 1 8 2018-01-24 2018-02-01 1 -2.3 0.5 -3.5 0.5 -3.50 -0.5 -2.0 -8.0 2 8 2018-01-25 2018-02-02 1 -11.6 -8.5 -6.5 -2.5 -7.25 -10.0 -10.8 -7.4 3 8 2018-01-26 2018-02-03 1 -3.8 -1.5 -9.0 -1.5 -5.75 -8.5 -4.6 -6.6 4 8 2018-01-27 2018-02-04 1 -7.1 -3.5 -11.5 -1.5 -10.75 -3.5 -7.8 -13.2 5 8 2018-01-28 2018-02-05 1 -9.6 -0.5 6.0 14.0 6.75 -8.5 -9.6 7.8 6 8 2018-01-29 2018-02-06 1 -11.2 -9.5 -10.5 0.0 1.50 -11.0 -12.0 -9.0
答案 2 :(得分:1)
鉴于数据框的名称与属性名称的.x
和.y
变体一致,我们可以在没有循环的情况下执行此操作:
deviation <- cbind(bias_correc[1:4], bias_correc[grep('\\.y', names(bias_correc))] -
bias_correc[grep('\\.x', names(bias_correc))])
这给出了:
Forecast_day Forecast_date DeliveryDate HourEnding Coast.y East.y FarWest.y North.y NorthCentral.y SouthCentral.y Southern.y West.y
1 8 2018-01-24 2018-02-01 1 -2.3 0.5 -3.5 0.5 -3.50 -0.5 -2.0 -8.0
2 8 2018-01-25 2018-02-02 1 -11.6 -8.5 -6.5 -2.5 -7.25 -10.0 -10.8 -7.4
3 8 2018-01-26 2018-02-03 1 -3.8 -1.5 -9.0 -1.5 -5.75 -8.5 -4.6 -6.6
4 8 2018-01-27 2018-02-04 1 -7.1 -3.5 -11.5 -1.5 -10.75 -3.5 -7.8 -13.2
5 8 2018-01-28 2018-02-05 1 -9.6 -0.5 6.0 14.0 6.75 -8.5 -9.6 7.8
6 8 2018-01-29 2018-02-06 1 -11.2 -9.5 -10.5 0.0 1.50 -11.0 -12.0 -9.0
答案 3 :(得分:0)
整齐的方法:
library(rlang)
library(purrr)
library(dplyr)
nms <- bias_correc %>%
names() %>%
sort() %>%
syms()
nms_x <- nms[ends_with(".x", vars = nms)]
nms_y <- nms[ends_with(".y", vars = nms)]
fn <- function (x, y) {
quo(`-`(!!y, !!x))
}
diffs <- nms_x %>%
map2(nms_y, fn) %>%
set_names(nms_y)
mutate(bias_correc, !!!diffs)
返回:
Forecast_day Forecast_date DeliveryDate HourEnding Coast.x Coast.y East.x East.y FarWest.x FarWest.y North.x North.y
1 8 2018-01-24 2018-02-01 1 60.8 -2.3 56.0 0.5 50.0 -3.5 49.0 0.5
2 8 2018-01-25 2018-02-02 1 62.6 -11.6 52.0 -8.5 41.0 -6.5 34.5 -2.5
3 8 2018-01-26 2018-02-03 1 50.5 -3.8 43.0 -1.5 45.5 -9.0 34.5 -1.5
4 8 2018-01-27 2018-02-04 1 56.8 -7.1 47.0 -3.5 49.5 -11.5 39.5 -1.5
5 8 2018-01-28 2018-02-05 1 58.9 -9.6 43.5 -0.5 35.5 6.0 24.5 14.0
6 8 2018-01-29 2018-02-06 1 59.4 -11.2 52.5 -9.5 49.5 -10.5 34.5 0.0
NorthCentral.x NorthCentral.y SouthCentral.x SouthCentral.y Southern.x Southern.y West.x West.y
1 57.50 -3.50 56.5 -0.5 60.4 -2.0 57.6 -8.0
2 44.75 -7.25 53.5 -10.0 63.6 -10.8 42.0 -7.4
3 45.50 -5.75 51.5 -8.5 55.0 -4.6 43.4 -6.6
4 52.75 -10.75 48.5 -3.5 61.8 -7.8 51.8 -13.2
5 35.75 6.75 53.5 -8.5 64.0 -9.6 32.6 7.8
6 38.50 1.50 56.0 -11.0 65.6 -12.0 45.2 -9.0