Question

对不起标题，希望它不会太误导。我有以下数据框df1：

 id1     clas1    clas2    clas3
 512     ns       abx      NA
 512     ns       or       NA
 512     abx      dm       sup
 845     or       NA       NA
 1265    dd       ivf      NA
 1265    ns       ivf      pts
 9453    col      ns       ivf
 9453    abx      ns       or     
 95635   ns       abx      or

然后我有“df2”，其中包含以下信息（df1 $ id1中的某些值包含在df2 $ id2中，反之亦然），这是另一个数据集中的列或第一个数据集的不同长度。

 id2      clas0
 102      ns
 512      ns
 915      ns
 1265     ns
 9453     ns
 10485    ns
 95639    ns
 100348   ns

我要做的是计算在任何clas列（即“ns”）中有多少“id1”具有id2的公共值（即“ns”）。

所以我试过这个：

 x<-as.numeric(levels(factor(df2$id2)))
 clas<-ls()
 for(i in 1:x){
   for(j in 1:length(df1$id1)){
     if(df1$id1==i){clas[[i]]=append(clas[[i]],c(df1$clas1[j],df1$clas2[j],df1$clas3[j]))}
   }
 }

我在这里要做的是创建一个包含所有clas1，clas2或clas3的列表，当id1重复时，以便稍后我可以看到clas0中的值何时包含在列表中的某个位置？但是我一直收到以下警告：

    In if (id1$id1 == i) { ... :
 the condition has length > 1 and only the first element will be used

我被困住了。有人能指出我正确的方向吗？非常感谢马可

Answer 1

我要做的是计算有多少“id1”具有共同值（即“ns”）在任何clas列中具有id2（即“ns”）。

df1 <- read.table(text="id1     clas1    clas2    clas3
 512     ns       abx      NA
 512     ns       or       NA
 512     abx      dm       sup
 845     or       NA       NA
 1265    dd       ivf      NA
 1265    ns       ivf      pts
 9453    col      ns       ivf
 9453    abx      ns       or     
 95635   ns       abx      or", header=TRUE)

df2 <- read.table(text=" id2      clas0
 102      ns
 512      ns
 915      ns
 1265     ns
 9453     ns
 10485    ns
 95639    ns
 100348   ns", header=TRUE)

df <- merge(df1, df2, by.x="id1", by.y="id2")
sum(apply(df$clas0 == df[, c("clas1", "clas2", "clas3")], 1, any, na.rm = TRUE))
#[1] 5

匹配列和列表

1 个答案: