Question

我目前正在处理两个不同的数据帧，其中一个数据帧非常长（long）。我需要做的是选择long的所有行，其中相应的id_type在其他（较小的）数据集中至少出现一次。

假设两个数据帧是：

long <- read.table(text = "
  id_type   x1   x2

   1       0     0  
   1       0     1
   1       1     0
   1       1     1
   2       0     0
   2       0     1
   2       1     0
   2       1     1
   3       0     0  
   3       0     1
   3       1     0
   3       1     1
   4       0     0  
   4       0     1
   4       1     0
   4       1     1", 
header=TRUE)

和

short <- read.table(text = "
  id_type   y1   y2    

   1       5     6    
   1       5     5    
   2       7     9", 
     header=TRUE)

在实践中，我想要获得的是：

 id_type   x1   x2    

  1       0     0  
  1       0     1
  1       1     0
  1       1     1
  2       0     0  
  2       0     1
  2       1     0
  2       1     1

我曾尝试使用out <- long[long[,"id_type"]==short[,"id_type"], ]，但这显然是错误的。你会怎么做？感谢

Answer 1

只需使用%in%：

out <- long[long$id_type %in% short$id_type, ]

查看?"%in%"。

Answer 2

你缺少%in%：

> long[long$id_type %in% unique(short$id_type),]
  id_type x1 x2
1       1  0  0                                                             
2       1  0  1                                                             
3       1  1  0                                                             
4       1  1  1                                                             
5       2  0  0                                                             
6       2  0  1                                                             
7       2  1  0                                                             
8       2  1  1

根据两个协变量级别的对应关系选择数据帧的行

2 个答案: