Question

我有一个包含起始 - 目的地数据和一些相关变量的数据集。它看起来像这样：

    "Origin","Destination","distance","volume"
    "A01"     "A01"          0.0        10
    "A02"     "A01"          1.2         9
    "A03"     "A01"          1.4        15 
    "A01"     "A02"          1.2        16

然后，对于每个起始 - 目的地对，我希望能够根据该行和所选其他行中的数据计算其他变量。例如，前往该目的地的其他多个原始区域的交通量大于焦点对。在这个例子中，我最终得到了目的地A01的以下内容。

    "Origin","Destination","distance","volume","greater_flow"
    "A01"    "A01"            0.0        10         1
    "A02"    "A01"            1.2         9         2
    "A03"    "A01"            1.4        15         0

我一直试图找出group_by和apply的内容，但无法解决如何a）'修复'我想用作参考的数据（从A01到A01的数据） A01）和b）仅将比较限制为具有相同目的地的数据（A01）和c）重复所有起始 - 目的地对。

Answer 1

这是使用基数R（使用jar）的答案：

apply

如果您需要对所有可能的目的地进行计算，您可以循环浏览d <- data.frame(Origin = c("A01", "A02", "A03", "A01"), Destination = c("A01", "A01", "A01", "A02"), distance = c(0.0, 1.2, 1.4, 1.2), volume = c(10, 9, 15, 16)) # extracting entries with destination = A01 d2 <- d[d[, "Destination"] == "A01", ] # calculating number of rows satisfying your condition greater_flow <- apply(d2, 1, FUN = function(x) max(sum(x['volume'] < d2[, 'volume']) - 1, 0) ) # sticking things back together data.frame(d2, greater_flow) # Origin Destination distance volume greater_flow # 1 A01 A01 0.0 10 1 # 2 A02 A01 1.2 9 2 # 3 A03 A01 1.4 15 0：

unique(d[, "Destination"])

然后，如果需要，可以通过lapply(unique(d[, "Destination"]), FUN = function(dest){ d2 <- d[d[, "Destination"] == dest, ] greater_flow <- apply(d2, 1, FUN = function(x) max(sum(x['volume'] < d2[, 'volume']) - 1, 0) ) data.frame(d2, greater_flow) })将输出粘合在一起。

Answer 2

library(plyr)
Fun <- function(x) { x <- x[order(x$volume),]; x$greater_flow <- (1:nrow(x))-1; x }
ddply(d, ~ Destination, .fun=Fun)

R中数据的逐行比较

2 个答案: