我有两个data.frames,我正在使用它们来创建一个新的变量C
(一个标准化的距离测量)。每个data.frame都有以下信息(坐标,季节,变量。对于每个唯一的坐标季节(即每个XX,YY,我将在C
和df.a
之间计算df.b
- 按季划分的X,Y对。为此,我将两个data.frames(df.new
)合并为准备计算C
。
以下是我目前如何执行此操作:
# for example, for season = SUM
# V1 and VV1 are the same variable from the different dataframes, SEA = Season,
# X,Y and XX, YY are coordinates
df.new.SUM <- subset(df.new, SEA == "SUM") # Summer
attach(df.new.SUM)
df.new.SUM$C_V1 <- (V1-VV1)^2/sd(V1)^2 # almost wouldn't need to subset except that the denominator here should only be for one season
df.new.SUM$C_V2 <- (V2-VV2)^2/sd(V2)^2
df.new.SUM$C <- sqrt(rowSums(df.new.SUM[,c("C_V1","C_V2")]))
# continue for other seasons and then rbind
然而,这似乎看起来很笨重。有没有办法计算每个季节C
- 坐标组没有子集化到data.frame然后每个季节进行rbinding?我如何只使用一个季节而不分组到新的data.frame?或者,更好的是,我如何以矢量化方式为每个季节做到这一点?我应该探索哪些包裹?
df.a <- structure(list(XX = c(10L, 10L, 11L, 11L, 12L, 12L, 13L, 13L,
14L, 14L), YY = c(20L, 20L, 21L, 21L, 22L, 22L, 23L, 23L, 15L,
15L), SEA = c("SUM", "WIN", "SUM", "WIN", "SUM", "WIN", "SUM",
"WIN", "SUM", "WIN"), VV1 = c(10.5, 15, 8, 8.5, 8, 7.5, 11, 13,
15, 10), VV2 = c(13, 3, 3.5, 6, 3.5, 3, 5, 4, 5, 5)), .Names = c("XX",
"YY", "SEA", "VV1", "VV2"), row.names = c(NA, -10L), class = "data.frame")
#
df.b <- structure(list(X = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), Y = c(1L, 1L, 2L, 2L,
3L, 3L, 4L, 4L, 5L, 5L, 1L, 1L, 2L, 2L, 3L, 3L, 4L, 4L, 5L, 5L
), SEA = c("SUM", "WIN", "SUM", "WIN", "SUM", "WIN", "SUM", "WIN",
"SUM", "WIN", "SUM", "WIN", "SUM", "WIN", "SUM", "WIN", "SUM",
"WIN", "SUM", "WIN"), V1 = c(10, 12, 10, 9.5, 10, 14.5, 10.5,
13, 11.5, 14, 12.5, 8.5, 10, 7.5, 11, 7, 11, 8, 11, 14.5), V2 = c(3.5,
3, 3.5, 2.5, 3, 5, 5.5, 4, 2, 2.5, 3.5, 2, 3.5, 4.5, 5.5, 3.5,
5, 6, 6, 5)), .Names = c("X", "Y", "SEA", "V1", "V2"), row.names = c(NA,
-20L), class = "data.frame")
#
df.new <- merge(df.a, df.b, by = c("SEA"), all.x = TRUE, allow.cartesian=TRUE)
#
# EDIT ## solution based on suggestions below
df.out <- data.frame()
seasons <- unique(df.new$SEA)
for (s in seasons){
data <- subset(df.new, SEA == s)
data$C <- sqrt(with(data, (V1-VV1)^2/sd(V1)^2 +(V2-VV2)^2/sd(V2)^2 ))
df.out <- rbind(df.out,data)
}
答案 0 :(得分:1)
将这些步骤包装在一起,请不要在将来使用attach
:
df.new.SUM$C <- sqrt( with(df.new.SUM, (V1-VV1)^2/sd(V1)^2 +(V2-VV2)^2/sd(V2)^2 ) )
with
功能更安全。但是,也许这不是你想要的。在merge
的交叉产品中,合并数据集中有50个SEA ==“SUM”的“组合”,但这些并不是您的英语描述所指定的。