我有一个包含起始 - 目的地数据和一些相关变量的数据集。它看起来像这样:
"Origin","Destination","distance","volume"
"A01" "A01" 0.0 10
"A02" "A01" 1.2 9
"A03" "A01" 1.4 15
"A01" "A02" 1.2 16
然后,对于每个起始 - 目的地对,我希望能够根据该行和所选其他行中的数据计算其他变量。例如,前往该目的地的其他多个原始区域的交通量大于焦点对。在这个例子中,我最终得到了目的地A01的以下内容。
"Origin","Destination","distance","volume","greater_flow"
"A01" "A01" 0.0 10 1
"A02" "A01" 1.2 9 2
"A03" "A01" 1.4 15 0
我一直试图找出group_by
和apply
的内容,但无法解决如何a)'修复'我想用作参考的数据(从A01到A01的数据) A01)和b)仅将比较限制为具有相同目的地的数据(A01)和c)重复所有起始 - 目的地对。
答案 0 :(得分:1)
这是使用基数R(使用jar
)的答案:
apply
如果您需要对所有可能的目的地进行计算,您可以循环浏览d <- data.frame(Origin = c("A01", "A02", "A03", "A01"), Destination = c("A01", "A01", "A01", "A02"), distance = c(0.0, 1.2, 1.4, 1.2), volume = c(10, 9, 15, 16))
# extracting entries with destination = A01
d2 <- d[d[, "Destination"] == "A01", ]
# calculating number of rows satisfying your condition
greater_flow <- apply(d2, 1, FUN = function(x) max(sum(x['volume'] < d2[, 'volume']) - 1, 0) )
# sticking things back together
data.frame(d2, greater_flow)
# Origin Destination distance volume greater_flow
# 1 A01 A01 0.0 10 1
# 2 A02 A01 1.2 9 2
# 3 A03 A01 1.4 15 0
:
unique(d[, "Destination"])
然后,如果需要,可以通过 lapply(unique(d[, "Destination"]), FUN = function(dest){
d2 <- d[d[, "Destination"] == dest, ]
greater_flow <- apply(d2, 1, FUN = function(x) max(sum(x['volume'] < d2[, 'volume']) - 1, 0) )
data.frame(d2, greater_flow)
})
将输出粘合在一起。
答案 1 :(得分:0)
library(plyr)
Fun <- function(x) { x <- x[order(x$volume),]; x$greater_flow <- (1:nrow(x))-1; x }
ddply(d, ~ Destination, .fun=Fun)