Question

根据我的上一个问题，我有一个新的归属问题。在编辑我的帖子后，在那里问一下并等待一周，我想在这里再试一次。

这次有一个更好的例子：

Equip<- c(1,1,1,1,2,2,2,3,3,3,4,4,4,5,5,5,6,6,6,6,6,6)
Notif <-c(1,1,3,4,2,2,2,5,6,7,9,9,15,10,11,12,13,14,16,17,18,19)
rank <- c(1,1,2,3,1,1,1,1,2,3,1,1,2,1,2,3,1,2,3,4,5,6)
Component <- c("Ventil","Motor","Ventil","Ventil","Vergaser","Vergaser","Bremse",
"Lichtmaschine","Bremse","Lichtmaschine","Bremse","Motor","Lichtmaschine",
"Bremse","Bremse","Motor","Vergaser","Motor","Vergaser","Motor",
"Vergaser","Motor")    

df <- data.frame(Equip,Notif,rank,Component)

Equip是我的主题，rank是实际访问次数。 Component是必须寻找的主题。

我希望得到这样的输出：

如果Equip(subject)被访问了2次（rank 1和2），所有Equip的{{1}} 1＆amp; 2，如果有{{1}这被认为是第一次和第二次。

如果rank列出了Component列表，则Equip(subject)被rank访问了3次（Equip 1,2和3）时间如Component 1，Equip 1，rank电机，Component 1，Equip 2，rank电机，Component 1 ，Equip 3，rank马达

输出的名称应为Component，如True“Motor”

我有一个代码，但有了这个，我可以比较1和2的访问，2和3一起等等（我不能再与行列分开，比如Equips有2个等级，Equips有3个排名等等）

代码是这样的：

Component

希望你能帮助我。请询问是否有任何问题。

根据您关于输出的问题以及您的解决方案，

a <- lapply(split(df,df$Equip),function(x){      
ll <- split(x,x$rank)                    
 if(length(ll)>1 )
ii <- intersect(ll[[1]]$Component,ll[[2]]$Component ) ## test intersection
  else 
   ii <- NA
 c(length(ii)> 0 && !is.na(ii),ii)                                              
})
b <- unlist(a)
c <- table(b,b)
rowSums(c)

类似的东西，但如果它更容易，装备和idx不是必要的

装备2级：

     Equip Component   V1 idx
1:     1    Ventil  TRUE   3
2:     2        NA  False  1
3:     3        NA  False  3
4:     4        NA  FALSE  2
5:     5        NA  FALSE  3
6:     6        NA  FALSE  6

装备3级：

TRUE          FALSE
  0             1

装备6级：

TRUE          FALSE
 1              2

Answer 1

这是我认为您感兴趣的输出。它使用data.table。

首先，我们使用data.table从data.frame df创建keys = Equip, Component，如下所示。

require(data.table) # load package
# then create the data.table with keys as specified above
# Check that both these columns are already sorted out for you!
dt <- data.table(df, key=c("Equip", "Component"))

其次，我们创建一个函数，为给定的排名查询（2,3等等）提供所需的输出。

this.check <- function(idx) {
    chk <- seq(1, idx)
    o <- subset(dt[, all(chk %in% rank), by=c("Equip", "Component")], V1 == TRUE)
    if (nrow(o) > 0) o[, idx:=idx]
}

这是做什么的？我们为rank=1,2运行此操作。我们通过以下方式运行：

> this.check(2)
# output
   Equip Component   V1 idx
1:     1    Ventil TRUE   2
2:     5    Bremse TRUE   2

这告诉您，对于Equip = 1 and 5，Components = Ventil and Bremse分别为rank = 1 and 2（用idx = 2表示）。您也可以获得列V1 = TRUE，即使我已经像@Carl指出的那样，也不了解这一点。如果需要，可以使用setnames

更改此输出的列名称

第三，我们使用此函数来查询ranks=1,2，然后查询ranks=1,2,3 ..等等。这可以通过简单的lapply完成，如下所示：

# Let's run the function for idx = 2 to 6. 
# This will check from rank = 1,2 until rank=1,2,3,4,5,6
o <- lapply(2:6, function(idx) {
    this.check(idx)
})
> o
[[1]]
   Equip Component   V1 idx
1:     1    Ventil TRUE   2
2:     5    Bremse TRUE   2

[[2]]
   Equip Component   V1 idx
1:     1    Ventil TRUE   3

[[3]]
NULL

[[4]]
NULL

[[5]]
NULL

它表明，对于rank=1,2和rank=1,2,3，您有一些Component。对于其他人来说，没有什么= NULL。

最后，我们可以bind使用rbind data.table将所有这些内容合并为一个o <- do.call(rbind, o) > o Equip Component V1 idx 1: 1 Ventil TRUE 2 2: 5 Bremse TRUE 2 3: 1 Ventil TRUE 3，如下所示：

idx=2

此处，Component是满足rank=1,2的{{1}}，而idx=3是满足rank=1,2,3的{{1}}。

全部放在一起：

this.check <- function(idx) {
    chk <- seq(1, idx)
    o <- subset(dt[, all(chk %in% rank), by=c("Equip", "Component")], V1 == TRUE)
    if (nrow(o) > 0) o[, idx:=idx]
}

o <- do.call(rbind, lapply(2:6, function(idx) {
    this.check(idx)
}))

我希望这会有所帮助。

编辑：（在评论中进行一系列交流后，这是我提出的新解决方案。我希望这就是你所追求的。）

require(data.table)
dt <- data.table(df, key=c("Equip", "Component"))
dt[, `:=`(e.max=max(rank)), by=Equip]
dt[, `:=`(ec.max=max(rank)), by=c("Equip", "Component")]
setkey(dt, "e.max", "ec.max")
this.check <- function(idx) {
    t1 <- dt[J(idx,idx)]
    t2 <- t1[, identical(as.numeric(seq_len(idx)), as.numeric(rank)), 
              by=c("Equip", "Component")]
    o <- table(t2$V1)
    if (length(o) == 1) 
        o <- c(o, "TRUE"=0)
    o <- c("idx"=idx, o)
}
o <- do.call(rbind, lapply(2:6, function(idx) this.check(idx)))

> o
#      idx FALSE TRUE
# [1,]   2     1    0
# [2,]   3     2    1
# [3,]   4     1    0
# [4,]   5     1    0
# [5,]   6     1    0

Answer 2

如果我按照

列的方式创建数据数组

foo<-cbind(Equip,Notif, rank, Component)
eqp<-1 # later, loop over all values
foo[c( which(  foo[,1]==eqp & (foo[,3]==1 | foo[,3]==2) ) ),4]
[1] "Ventil" "Motor"  "Ventil"

将这些结果提供给table并提取带有count == 2

的项目

显然，任何出现两次的项目都是你想要的这不是我建议使用的答案，因为像ddply和aggregate这样的工具可以更干净地完成这项工作，但我想确保这是你所追求的答案，假设一个循环原始eqp中的Equip值。

在更多条件下的比较

2 个答案: