计算每两列之间的差异

时间:2018-02-08 20:29:03

标签: r

我有一个名为bias_correc的数据框,观察结果为x列,预测值为y列。下面显示的数据框包含不同区域的观测和预测,因此列的名称相似,我想在每个时间步计算每个位置(即海岸)的预测偏差。

我知道如何通过每组位置列创建一个简单减法的新列来手动执行此操作

bias_correc$Coast <- bias_correc$Coast.y- bias_correct$Coast.x

但我希望通过应用函数或循环执行此操作(如果可能),以便计算每组位置列并将其转储到此数据框或新数据框中。

我熟悉seq函数并且过去曾使用它,但我不确定如何将它包装到apply函数或循环中,以便计算每个列的位置差异。

非常感谢任何帮助。

bias_correc <-
structure(list(Forecast_day = c(8, 8, 8, 8, 8, 8), Forecast_date = structure(c(17555, 
17556, 17557, 17558, 17559, 17560), class = "Date"), DeliveryDate = structure(c(17563, 
17564, 17565, 17566, 17567, 17568), class = "Date"), HourEnding = c(1L, 
1L, 1L, 1L, 1L, 1L), Coast.x = c(60.8, 62.6, 50.5, 56.8, 58.9, 
59.4), Coast.y = c(58.5, 51, 46.7, 49.7, 49.3, 48.2), East.x = c(56, 
52, 43, 47, 43.5, 52.5), East.y = c(56.5, 43.5, 41.5, 43.5, 43, 
43), FarWest.x = c(50, 41, 45.5, 49.5, 35.5, 49.5), FarWest.y = c(46.5, 
34.5, 36.5, 38, 41.5, 39), North.x = c(49, 34.5, 34.5, 39.5, 
24.5, 34.5), North.y = c(49.5, 32, 33, 38, 38.5, 34.5), NorthCentral.x = c(57.5, 
44.75, 45.5, 52.75, 35.75, 38.5), NorthCentral.y = c(54, 37.5, 
39.75, 42, 42.5, 40), SouthCentral.x = c(56.5, 53.5, 51.5, 48.5, 
53.5, 56), SouthCentral.y = c(56, 43.5, 43, 45, 45, 45), Southern.x = c(60.4, 
63.6, 55, 61.8, 64, 65.6), Southern.y = c(58.4, 52.8, 50.4, 54, 
54.4, 53.6), West.x = c(57.6, 42, 43.4, 51.8, 32.6, 45.2), West.y = c(49.6, 
34.6, 36.8, 38.6, 40.4, 36.2)), class = "data.frame", row.names = c(NA, 
-6L), .Names = c("Forecast_day", "Forecast_date", "DeliveryDate", 
"HourEnding", "Coast.x", "Coast.y", "East.x", "East.y", "FarWest.x", 
"FarWest.y", "North.x", "North.y", "NorthCentral.x", "NorthCentral.y", 
"SouthCentral.x", "SouthCentral.y", "Southern.x", "Southern.y", 
"West.x", "West.y"))

4 个答案:

答案 0 :(得分:2)

如果我们对列名做一些字符串操作,那应该相当简单。

# find column names ending in ".x"
var_names <- names(bias_correc)[grepl(pattern = ".x",
                                      x = names(bias_correc),
                                      fixed = TRUE)]
# replace ".x" with "" (blank)
var_names <- gsub(pattern = ".x", replacement = "", x = var_names, fixed = TRUE)
# subtract y and x
(diff_table <- bias_correc[paste0(var_names, ".y")] - bias_correc[paste0(var_names, ".x")])

  Coast.y East.y FarWest.y North.y NorthCentral.y SouthCentral.y Southern.y West.y
1    -2.3    0.5      -3.5     0.5          -3.50           -0.5       -2.0   -8.0
2   -11.6   -8.5      -6.5    -2.5          -7.25          -10.0      -10.8   -7.4
3    -3.8   -1.5      -9.0    -1.5          -5.75           -8.5       -4.6   -6.6
4    -7.1   -3.5     -11.5    -1.5         -10.75           -3.5       -7.8  -13.2
5    -9.6   -0.5       6.0    14.0           6.75           -8.5       -9.6    7.8
6   -11.2   -9.5     -10.5     0.0           1.50          -11.0      -12.0   -9.0

cbind(bias_correc, setNames(diff_table, var_names)) # bind back to original table

答案 1 :(得分:2)

您可以使用grep提取特定列,并使用apply系列迭代data.frame执行减法。

代码:

cNamesID <- grep("\\.[x|y]$", colnames(bias_correc))
cNames <- unique(sub("\\.[x|y]$", "", colnames(bias_correc)[cNamesID]))
cbind(bias_correc[, -cNamesID],
      sapply(cNames, function(x) 
             bias_correc[, paste0(x, ".y")] - bias_correc[, paste0(x, ".x")]))

结果:

Forecast_day Forecast_date DeliveryDate HourEnding Coast East FarWest North NorthCentral SouthCentral Southern  West
1            8    2018-01-24   2018-02-01          1  -2.3  0.5    -3.5   0.5        -3.50         -0.5     -2.0  -8.0
2            8    2018-01-25   2018-02-02          1 -11.6 -8.5    -6.5  -2.5        -7.25        -10.0    -10.8  -7.4
3            8    2018-01-26   2018-02-03          1  -3.8 -1.5    -9.0  -1.5        -5.75         -8.5     -4.6  -6.6
4            8    2018-01-27   2018-02-04          1  -7.1 -3.5   -11.5  -1.5       -10.75         -3.5     -7.8 -13.2
5            8    2018-01-28   2018-02-05          1  -9.6 -0.5     6.0  14.0         6.75         -8.5     -9.6   7.8
6            8    2018-01-29   2018-02-06          1 -11.2 -9.5   -10.5   0.0         1.50        -11.0    -12.0  -9.0

答案 2 :(得分:1)

鉴于数据框的名称与属性名称的.x.y变体一致,我们可以在没有循环的情况下执行此操作:

deviation <- cbind(bias_correc[1:4], bias_correc[grep('\\.y', names(bias_correc))] -
                                     bias_correc[grep('\\.x', names(bias_correc))])

这给出了:

  Forecast_day Forecast_date DeliveryDate HourEnding Coast.y East.y FarWest.y North.y NorthCentral.y SouthCentral.y Southern.y West.y
1            8    2018-01-24   2018-02-01          1    -2.3    0.5      -3.5     0.5          -3.50           -0.5       -2.0   -8.0
2            8    2018-01-25   2018-02-02          1   -11.6   -8.5      -6.5    -2.5          -7.25          -10.0      -10.8   -7.4
3            8    2018-01-26   2018-02-03          1    -3.8   -1.5      -9.0    -1.5          -5.75           -8.5       -4.6   -6.6
4            8    2018-01-27   2018-02-04          1    -7.1   -3.5     -11.5    -1.5         -10.75           -3.5       -7.8  -13.2
5            8    2018-01-28   2018-02-05          1    -9.6   -0.5       6.0    14.0           6.75           -8.5       -9.6    7.8
6            8    2018-01-29   2018-02-06          1   -11.2   -9.5     -10.5     0.0           1.50          -11.0      -12.0   -9.0

答案 3 :(得分:0)

整齐的方法:

library(rlang)
library(purrr)
library(dplyr)

nms <- bias_correc %>%
  names() %>%
  sort() %>%
  syms()

nms_x <- nms[ends_with(".x", vars = nms)]
nms_y <- nms[ends_with(".y", vars = nms)]

fn <- function (x, y) {
  quo(`-`(!!y, !!x))
}

diffs <- nms_x %>%
  map2(nms_y, fn) %>%
  set_names(nms_y)

mutate(bias_correc, !!!diffs)

返回:

  Forecast_day Forecast_date DeliveryDate HourEnding Coast.x Coast.y East.x East.y FarWest.x FarWest.y North.x North.y
1            8    2018-01-24   2018-02-01          1    60.8    -2.3   56.0    0.5      50.0      -3.5    49.0     0.5
2            8    2018-01-25   2018-02-02          1    62.6   -11.6   52.0   -8.5      41.0      -6.5    34.5    -2.5
3            8    2018-01-26   2018-02-03          1    50.5    -3.8   43.0   -1.5      45.5      -9.0    34.5    -1.5
4            8    2018-01-27   2018-02-04          1    56.8    -7.1   47.0   -3.5      49.5     -11.5    39.5    -1.5
5            8    2018-01-28   2018-02-05          1    58.9    -9.6   43.5   -0.5      35.5       6.0    24.5    14.0
6            8    2018-01-29   2018-02-06          1    59.4   -11.2   52.5   -9.5      49.5     -10.5    34.5     0.0
  NorthCentral.x NorthCentral.y SouthCentral.x SouthCentral.y Southern.x Southern.y West.x West.y
1          57.50          -3.50           56.5           -0.5       60.4       -2.0   57.6   -8.0
2          44.75          -7.25           53.5          -10.0       63.6      -10.8   42.0   -7.4
3          45.50          -5.75           51.5           -8.5       55.0       -4.6   43.4   -6.6
4          52.75         -10.75           48.5           -3.5       61.8       -7.8   51.8  -13.2
5          35.75           6.75           53.5           -8.5       64.0       -9.6   32.6    7.8
6          38.50           1.50           56.0          -11.0       65.6      -12.0   45.2   -9.0