Question

我有点问题，我有这些参数：

df <- data.frame(Equip = c(1,1,1,1,1,2,2,2,2,2),
                 Notif = c(1,1,1,2,2,3,3,3,3,4),
                 Component = c("Dichtung","Motor","Getriebe","Service","Motor","Lüftung","Dichtring","Motor","Getriebe","Dichtring"),
                rank= c(1 , 1 , 1 , 2 , 2 , 1 , 1 , 1 , 1 , 2))

现在，我希望进行比较，仅查看一个Equip，如果第一个Components中使用的rank与第二个{{1}中的rank相同}（仅由相同的Equip）：

有两种方式：

第一个：所有组件都相同吗？

任何（最小1）组件是否相同？

我需要一个高度自动化的解决方案，因为我的数据集有超过150k行。

所需的答案可能是一个只有布尔表达式的向量，包括TRUE和FALSE。

所以对于上面的例子，

answer <- c(TRUE,TRUE)

因为装备1等级1组件：电机“AND”装备1等级2是组件：电机。（所需方式的示例）

非常感谢你的帮助=）

我使用了评论功能，但我无法显示问题，因为我想显示代码。

请抱歉......

原始数据有超过2个排名现在我想在一个步骤中将排名x和排名x + 1组合在一起，因为这是用于此我在函数中使用了foor循环但它不起作用？



a <- lapply(split(df,df$Equips),function(x){
 for(i in 1:8){
  ll <- split(x,x$rank) 
if(length(ll)>i )
 ii <- intersect(ll[[i]]$Comps,ll[[i+1]]$Comps ) 
else ii <- NA c(length(ii)> 0 && !is.na(ii),ii) 
} 
})
 b <- unlist(a) 
c <- table(b,b) 
rowSums(c)

a <- lapply(split(df,df$Equips),function(x){ for(i in 1:8){ ll <- split(x,x$rank) if(length(ll)>i ) ii <- intersect(ll[[i]]$Comps,ll[[i+1]]$Comps ) else ii <- NA c(length(ii)> 0 && !is.na(ii),ii) } }) b <- unlist(a) c <- table(b,b) rowSums(c) 任何想法我能为它做什么（主要的想法是一步到1-2,2-3,3-4等结果！

Answer 1

这是一个可能的解决方案：

df <- data.frame(Equip = c(1,1,1,1,1,2,2,2,2,2),
                 Notif = c(1,1,1,2,2,3,3,3,3,4),
                 Component = c("Dichtung","Motor","Getriebe","Service","Motor","Lüftung","Dichtring","Motor","Getriebe","Dichtring"),
                 rank= c(1 , 1 , 1 , 2 , 2 , 1 , 1 , 1 , 1 , 2))


allComponents <- function(subDf){
  setequal(subDf[subDf$rank==1,'Component'],subDf[subDf$rank==2,'Component'])
}

anyComponents <- function(subDf){
  length(intersect(subDf[subDf$rank==1,'Component'],subDf[subDf$rank==2,'Component'])) > 0
}

# all components are equal
res1 <- by(df,INDICES=df$Equip,FUN=allComponents)
# at least one component equal
res2 <- by(df,INDICES=df$Equip,FUN=anyComponents)

as.vector(res1)
> FALSE, FALSE

as.vector(res2)
> TRUE, TRUE

Answer 2

包plyr适合群组操作

dat.r <- dlply(df ,.(Equip),function(x){      # I split by Equipe
  ll <- split(x,x$rank)                       # I split by rank

  if(length(ll)> 1)
    ii <- intersect(ll[[1]]$Comps,ll[[2]]$Comps ) ## test intersection
  else 
    ii <- NA
  c(length(ii)> 0 && !is.na(ii),ii)                        ## the result
})

这里我得到了比较结果和组件名称

dat.r
$`1`
[1] "TRUE"  "Motor"

编辑：这里是基础包的结果（没有互联网）

lapply(split(df,df$Equip),function(x){      # I split by Equipe
  ll <- split(x,x$rank)                       # I split by rank
  if(length(ll)> 1)
    ii <- intersect(ll[[1]]$Comps,ll[[2]]$Comps ) ## test intersection
  else 
    ii <- NA
  c(length(ii)> 0 && !is.na(ii),ii)                                          ## the result
})

$`1`
[1] "TRUE"  "Motor"

$`2`
[1] "TRUE"      "Dichtring"

2个条件下的比较

2 个答案: