Question

我有两个data.frames，我正在使用它们来创建一个新的变量C（一个标准化的距离测量）。每个data.frame都有以下信息（坐标，季节，变量。对于每个唯一的坐标季节（即每个XX，YY，我将在C和df.a之间计算df.b - 按季划分的X，Y对。为此，我将两个data.frames（df.new）合并为准备计算C。

以下是我目前如何执行此操作：

# for example, for season = SUM
# V1 and VV1 are the same variable from the different dataframes, SEA = Season, 
# X,Y and XX, YY are coordinates 
df.new.SUM <- subset(df.new, SEA == "SUM") # Summer
attach(df.new.SUM)
df.new.SUM$C_V1 <- (V1-VV1)^2/sd(V1)^2 # almost wouldn't need to subset except that the denominator here should only be for one season
df.new.SUM$C_V2 <- (V2-VV2)^2/sd(V2)^2
df.new.SUM$C <- sqrt(rowSums(df.new.SUM[,c("C_V1","C_V2")]))
# continue for other seasons and then rbind

然而，这似乎看起来很笨重。有没有办法计算每个季节C - 坐标组没有子集化到data.frame然后每个季节进行rbinding？我如何只使用一个季节而不分组到新的data.frame？或者，更好的是，我如何以矢量化方式为每个季节做到这一点？我应该探索哪些包裹？

df.a <- structure(list(XX = c(10L, 10L, 11L, 11L, 12L, 12L, 13L, 13L, 
14L, 14L), YY = c(20L, 20L, 21L, 21L, 22L, 22L, 23L, 23L, 15L, 
15L), SEA = c("SUM", "WIN", "SUM", "WIN", "SUM", "WIN", "SUM", 
"WIN", "SUM", "WIN"), VV1 = c(10.5, 15, 8, 8.5, 8, 7.5, 11, 13, 
15, 10), VV2 = c(13, 3, 3.5, 6, 3.5, 3, 5, 4, 5, 5)), .Names = c("XX", 
"YY", "SEA", "VV1", "VV2"), row.names = c(NA, -10L), class = "data.frame")
#
df.b <- structure(list(X = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), Y = c(1L, 1L, 2L, 2L, 
3L, 3L, 4L, 4L, 5L, 5L, 1L, 1L, 2L, 2L, 3L, 3L, 4L, 4L, 5L, 5L
), SEA = c("SUM", "WIN", "SUM", "WIN", "SUM", "WIN", "SUM", "WIN", 
"SUM", "WIN", "SUM", "WIN", "SUM", "WIN", "SUM", "WIN", "SUM", 
"WIN", "SUM", "WIN"), V1 = c(10, 12, 10, 9.5, 10, 14.5, 10.5, 
13, 11.5, 14, 12.5, 8.5, 10, 7.5, 11, 7, 11, 8, 11, 14.5), V2 = c(3.5, 
3, 3.5, 2.5, 3, 5, 5.5, 4, 2, 2.5, 3.5, 2, 3.5, 4.5, 5.5, 3.5, 
5, 6, 6, 5)), .Names = c("X", "Y", "SEA", "V1", "V2"), row.names = c(NA, 
-20L), class = "data.frame")
#
df.new <- merge(df.a, df.b, by = c("SEA"), all.x = TRUE, allow.cartesian=TRUE)
#
# EDIT ## solution based on suggestions below
df.out <- data.frame()
seasons <- unique(df.new$SEA)
for (s in seasons){
  data <- subset(df.new, SEA == s)
  data$C <- sqrt(with(data, (V1-VV1)^2/sd(V1)^2 +(V2-VV2)^2/sd(V2)^2 ))
  df.out <- rbind(df.out,data)

}

Answer 1

将这些步骤包装在一起，请不要在将来使用attach：

df.new.SUM$C <- sqrt( with(df.new.SUM, (V1-VV1)^2/sd(V1)^2 +(V2-VV2)^2/sd(V2)^2 ) )

with功能更安全。但是，也许这不是你想要的。在merge的交叉产品中，合并数据集中有50个SEA ==“SUM”的“组合”，但这些并不是您的英语描述所指定的。

在没有首先将子集保存为新data.frame的情况下计算数据子集

1 个答案: