提前致谢。我有一个行程数据框,表示起始位置,完成位置和每个位置之间的距离组合。像这样:
Start = c("Johns House", "Mikes House", "Franks House")
Finish = c("Mikes House", "Johns House", "Lisas House")
Distance = c(1000,1000,500)
myDF = data.frame(Start, Finish, Distance)
我想返回一个处理两次John / Mike旅行的新数据框,但将它们视为一个独特的组合。具体来说,我想返回每个组合的总出行次数和位置之间的距离 - 因此输出将是:
newStart = c("Johns House", "Franks House")
newFinish = c("Mikes House", "Lisas House")
newDistance = c(1000,500)
Count = c(2,1)
newDF = data.frame(newStart, newFinish, newDistance, Count)
再次感谢。
答案 0 :(得分:2)
我认为使用SQL会更容易。在R中安装SQL包,如“sqldf”。
首先,您可以计算每个元组的开始次数 - 以任一方式完成:
library(sqldf)
sqldf("select distinct
min(a.Start, a.Finish) Start,
max(a.Start, a.Finish) Finish,
a.Distance,
count(*) Count
from myDF a, myDF b
where (a.Start = b.Start and a.Finish = b.Finish)
or (a.Start = b.Finish and a.Finish = b.Start)
group by a.Start")
# Start Finish Distance Count
# 1 Franks House Lisas House 500 1
# 2 Johns House Mikes House 1000 2
答案 1 :(得分:1)
library(data.table)
myDT <- data.table(myDF)
x <- paste(myDT$Start, myDT$Finish, sep = "|")
myDT$v <- vapply(x, function(xi) paste(sort(strsplit(xi, "[|]")[[1]]), collapse=''), '')
myDT[, Count := length(Distance), by = v]
myDT <- myDT[!duplicated(v), ]
myDT
# Start Finish Distance v Count
#1: Johns House Mikes House 1000 JohnsMikes 2
#2: Franks House Lisas House 500 FranksLisas 1
我用@ Tommy的答案How to sort letters in a string?对字符串进行排序。