R中数据的逐行比较

时间:2015-10-20 13:53:48

标签: r dplyr

我有一个包含起始 - 目的地数据和一些相关变量的数据集。它看起来像这样:

    "Origin","Destination","distance","volume"
    "A01"     "A01"          0.0        10
    "A02"     "A01"          1.2         9
    "A03"     "A01"          1.4        15 
    "A01"     "A02"          1.2        16

然后,对于每个起始 - 目的地对,我希望能够根据该行和所选其他行中的数据计算其他变量。例如,前往该目的地的其他多个原始区域的交通量大于焦点对。在这个例子中,我最终得到了目的地A01的以下内容。

    "Origin","Destination","distance","volume","greater_flow"
    "A01"    "A01"            0.0        10         1
    "A02"    "A01"            1.2         9         2
    "A03"    "A01"            1.4        15         0

我一直试图找出group_byapply的内容,但无法解决如何a)'修复'我想用作参考的数据(从A01到A01的数据) A01)和b)仅将比较限制为具有相同目的地的数据(A01)和c)重复所有起始 - 目的地对。

2 个答案:

答案 0 :(得分:1)

这是使用基数R(使用jar)的答案:

apply

如果您需要对所有可能的目的地进行计算,您可以循环浏览d <- data.frame(Origin = c("A01", "A02", "A03", "A01"), Destination = c("A01", "A01", "A01", "A02"), distance = c(0.0, 1.2, 1.4, 1.2), volume = c(10, 9, 15, 16)) # extracting entries with destination = A01 d2 <- d[d[, "Destination"] == "A01", ] # calculating number of rows satisfying your condition greater_flow <- apply(d2, 1, FUN = function(x) max(sum(x['volume'] < d2[, 'volume']) - 1, 0) ) # sticking things back together data.frame(d2, greater_flow) # Origin Destination distance volume greater_flow # 1 A01 A01 0.0 10 1 # 2 A02 A01 1.2 9 2 # 3 A03 A01 1.4 15 0

unique(d[, "Destination"])

然后,如果需要,可以通过 lapply(unique(d[, "Destination"]), FUN = function(dest){ d2 <- d[d[, "Destination"] == dest, ] greater_flow <- apply(d2, 1, FUN = function(x) max(sum(x['volume'] < d2[, 'volume']) - 1, 0) ) data.frame(d2, greater_flow) }) 将输出粘合在一起。

答案 1 :(得分:0)

library(plyr)
Fun <- function(x) { x <- x[order(x$volume),]; x$greater_flow <- (1:nrow(x))-1; x }
ddply(d, ~ Destination, .fun=Fun)