我的data.frame(Networks)
包含以下内容:
Location <- c("Farm", "Supermarket", "Farm", "Conference",
"Supermarket", "Supermarket")
Instructor <- c("Bob", "Bob", "Louise", "Sally", "Lee", "Jeff")
Operator <- c("Lee", "Lee", "Julie", "Louise", "Bob", "Louise")
Networks <- data.frame(Location, Instructor, Operator, stringsAsFactors=FALSE)
我的问题
我希望在新的data.frame Transactions$Count
中添加一个新列Transactions
,对每个Instructor
的每个Operator
和Location
之间的交换进行求和
预期输出
Location <- c("Farm", "Supermarket", "Farm", "Conference", "Supermarket")
Person1 <- c("Bob", "Louise", "Sally", "Jeff")
Person2 < - c("Lee", "Julie", "Louise", "Louise")
Count < - c(1, 2, 1, 1, 1)
Transactions <- data.frame(Location, Person1, Person2, Count,
stringsAsFactors=FALSE)
例如,鲍勃和李在超市中总共有2次交流。如果一个人是讲师或操作员并不重要,我对他们的交流感兴趣。在预期的产量中,鲍勃和李在超市的两次交流被注意到。在其他地方,每隔一个组合就有一次交换。
我做了什么
我认为grepl
可能有用,但我希望迭代这些数据的1300行,因此它的计算成本可能很高。
谢谢。
答案 0 :(得分:4)
您可以考虑使用“data.table”并在“by”参数中使用pmin
和pmax
。
示例:
Networks <- data.frame(Location, Instructor, Operator, stringsAsFactors = FALSE)
library(data.table)
as.data.table(Networks)[
, TransCount := .N,
by = list(Location,
pmin(Instructor, Operator),
pmax(Instructor, Operator))][]
# Location Instructor Operator TransCount
# 1: Farm Bob Lee 1
# 2: Supermarket Bob Lee 2
# 3: Farm Louise Julie 1
# 4: Conference Sally Louise 1
# 5: Supermarket Lee Bob 2
# 6: Supermarket Jeff Louise 1
根据您的更新,听起来这可能更适合您:
as.data.table(Networks)[
, c("Person1", "Person2") := list(
pmin(Instructor, Operator),
pmax(Instructor, Operator)),
by = 1:nrow(Networks)
][
, list(TransCount = .N),
by = .(Location, Person1, Person2)
]
# Location Person1 Person2 TransCount
# 1: Farm Bob Lee 1
# 2: Supermarket Bob Lee 2
# 3: Farm Julie Louise 1
# 4: Conference Louise Sally 1
# 5: Supermarket Jeff Louise 1