我正在努力想出一个针对以下问题的矢量化解决方案。我有两个数据帧:
> people <- data.frame(name = c('Fred', 'Bob'), profession = c('Builder', 'Baker'))
> people
name profession
1 Fred Builder
2 Bob Baker
> allowed <- data.frame(name = c('Fred', 'Fred', 'Bob', 'Bob'), profession = c('Builder', 'Baker', 'Barman', 'Biker'))
> allowed
name profession
1 Fred Builder
2 Fred Baker
3 Bob Barman
4 Bob Biker
也就是说,我想检查一下每个人都有一个允许的职业,并返回任何没有的职业。
例如,弗雷德可以是建筑师或贝克,所以他很好。但是,Bob可以是Barman或Biker,但不是Baker(注意:在我的用例中只有两个允许的职业)。
我想返回一个数据框,这些名称没有允许的职业:
name profession permitted
1 Bob Baker Biker
2 Bob Baker Barman
感谢您的帮助
答案 0 :(得分:1)
简单的基础解决方案。我相信有人可以提出更好的东西。
out <- allowed[!allowed$name %in% merge(people, allowed)$name, ]
这可以让你获得所需的人以及他们允许的职业。如果你也想要他们的实际职业:
names(out)[2] <- "permitted"
out <- merge(people, out, all.y=TRUE)
答案 1 :(得分:1)
这是一个稍微更具可读性的data.table
解决方案。如果您认为可读,您可以在同一行上执行最后一步,使其成为单行。
# load library, convert people to a data.table and set a key
library(data.table)
people = data.table(people, key = "name,profession")
# compute
result = data.table(allowed, key = "name")[people[!allowed]]
setnames(result, "profession.1", "permitted")
result
# name profession permitted
#1: Bob Barman Baker
#2: Bob Biker Baker
答案 2 :(得分:0)
可能还有另一种方式,但这应该有效。我添加了第三个具有不允许专业的人员,向您展示如何将该功能应用于整个数据集。
currentprof <-structure(list(name = structure(c(2L, 1L, 3L), .Label = c("Bob",
"Fred", "Jan"), class = "factor"), profession = structure(c(3L,
2L, 1L), .Label = c("Analyst", "Baker", "Builder"), class = "factor")), .Names = c("name",
"profession"), class = "data.frame", row.names = c(NA, -3L))
allowed <- structure(list(name = structure(c(2L, 2L, 1L, 1L, 3L, 3L), .Label = c("Bob",
"Fred", "Jan"), class = "factor"), profession = structure(c(4L,
1L, 2L, 3L, 6L, 5L), .Label = c("Baker", "Barman", "Biker", "Builder",
"Driver", "Teacher"), class = "factor")), .Names = c("name",
"profession"), class = "data.frame", row.names = c(NA, -6L))
checkprof <- function(name){
allowedn <- allowed[allowed$name == name,]
currentprofn <- currentprof[currentprof$name==name,]
if(!currentprofn$profession %in% allowedn$profession)
{result <- merge(currentprofn, allowedn, by = "name", all.x=TRUE)} else
{result <-data.frame(col1=character(),
col2=character(),
col3=character(),
stringsAsFactors=FALSE)}
colnames(result) <- c("name","profession","permitted")
return(result)
}
do.call(rbind,lapply(levels(allowed$name),checkprof))
答案 3 :(得分:0)
这是我的看法。可能需要更多测试。我会自己接受建议。它适用于您的示例,但我不确定它是否会概括。
people$check <- ifelse(people$profession %in% allowed[which(allowed$name == people$name),"profession"], TRUE,FALSE)
people_select <- people[people$check == TRUE,]
编辑:只是为了澄清,以防止你退出投票。 ifelse是矢量化的,并且运行速度非常快。