我有两张关于交通流量的数据表。我试图(最终)将它们组合成按里程标记的交通线性进展情节。例如:
mileposts <- structure(list(city = c("city1", "city2", "city3", "city4"),
milepost = c(0L, 50L, 120L, 250L)), .Names = c("city", "milepost"
), class = "data.frame", row.names = c("1", "2", "3", "4"))
city milepost
1 city1 0
2 city2 50
3 city3 120
4 city4 250
traffic <- structure(list(citypair = c("city1-city2", "city2-city4", "city1-city3",
"city1-city4", "city3-city4"), traffic = c(610L, 23L, 139L, 88L,
17L), origmp = c(0L, 50L, 0L, 0L, 120L), destmp = c(50L, 250L,
120L, 250L, 250L)), .Names = c("citypair", "traffic", "origmp",
"destmp"), class = "data.frame", row.names = c("1", "2", "3",
"4", "5"))
citypair traffic origmp destmp
1 city1-city2 610 0 50
2 city2-city4 23 50 250
3 city1-city3 139 0 120
4 city1-city4 88 0 250
5 city3-city4 17 120 250
我想要的是在“里程碑”表中添加一个列“卷”,列出从该城市开始或经过该城市的所有流量(城市按1-2-3-4的顺序排列)。例如,city3的数量将是来自流量[c(2,4,5),2]的值的总和。
我该怎么做?我知道它必须是某种for循环。我尝试了一个循环,在traffic$traffic to mileposts$vol
条件traffic$origmp[i] >= mileposts$milepost
和traffic$destmp[i] <= mileposts$milepost
上添加"the condition has length > 1 and only the first element will be used"
值,但我收到错误[j]
。但是,如果我将整个事物包裹在mileposts$milepost
上的{{1}}维度上,则整个运行变得非常慢。有关如何有效加快/编码的任何建议?
更一般地说,我想我正在问如何以有效的方式使用两个数据帧之间的数据来执行条件操作(即,不循环遍历两个数据帧的每一行)。谢谢!
答案 0 :(得分:1)
这有点令人费解,但它确实有效:
cityorder <- c("city1","city2","city3","city4")
through <- lapply(strsplit(traffic$citypair,"-"),match,cityorder)
through <- lapply(through,function(x) seq(x[1],x[2]-1))
citymatch <- sapply(mileposts$city, grep, cityorder)
sum.ids <- lapply(citymatch, function(x) sapply(through, function(y) x %in% y) )
mileposts$traffic <- sapply(sum.ids, function(x) sum(traffic$traffic[x]) )
# city milepost traffic
#1 city1 0 837
#2 city2 50 250
#3 city3 120 128
#4 city4 250 0
结果以预期结果结帐“ city3的数量将是来自流量[c(2,4,5),2]的值的总和”
sum(traffic[c(2, 4, 5),2])
#[1] 128
答案 1 :(得分:0)
使用您的两张表 - mileposts
和traffic
已经在内存中,我可以使用下面的代码获得您想要的结果 -
library(data.table)
# building index of which route traffic is to be associated with which city
uniquecities <- unique(mileposts$milepost)
uniqueCityCombns <- data.table(expand.grid(uniquecities,uniquecities,uniquecities))
setnames(uniqueCityCombns, c('origmp','destmp','milepost'))
uniqueCityCombns <- uniqueCityCombns[origmp < destmp & milepost < destmp]
uniqueCityCombns <- data.table(uniqueCityCombns <- uniqueCityCombns[origmp <= milepost])
# calculating traffic passing through the city
uniqueCityCombnsTrf <- merge(uniqueCityCombns,traffic, by = c('origmp','destmp'))
uniqueCityCombnsTrf <- uniqueCityCombnsTrf [,list(traffic = sum(traffic)), by = 'milepost']
uniqueCityCombnsTrf <- merge(uniqueCityCombnsTrf , mileposts, by = 'milepost')
输出 -
> uniqueCityCombnsTrf
milepost traffic city
1: 0 837 city1
2: 50 250 city2
3: 120 128 city3
答案 2 :(得分:0)
traffic$start <- as.numeric(gsub("city|-city.+$", "", traffic$citypair) )
traffic$end <- as.numeric(gsub("city[[:digit:]]*|-city", "", traffic$citypair) )
sapply(mileposts$city, function(cit) {n=as.numeric(sub("city","",cit))
sum(traffic$traffic*( (n >= traffic$start) & n < traffic$end) )} )
#---------
city1 city2 city3 city4
837 250 128 0