Question

我有一个由大量Serial_numbers组成的数据框。在Serial_number之前和之后测量每个irradiation，由＆＃34; 0＆＃34;表示。和＆＃34; 1＆＃34;。

如果每个Serial_number确实提供了两个数据集，我想检查我的数据帧。这可能是一个简单的要求，但我还没有找到实际的解决方案，但是......

  Serial_number Irradiated Amplification Voltage

1 912009913 0 1.002520 24.9681
2 912009913 0 1.004520 29.9591
3 912009913 0 1.005370 34.9494
4 912009913 1 1.005600 44.9372
5 912009913 1 1.006830 49.9329
6 912009913 1 1.006900 54.9625

7 912009897 0 1.004537 26.4681
8 912009897 0 1.007240 28.9191
9 912009897 0 1.008167 29.4183
10 912009897 1 1.009153 33.1763
11 912009897 1 1.010291 36.1843
12 912009897 1 1.021757 41.4690

...

Answer 1

您可以将状态为Irratiadet == 0的唯一序列号与Irradiated == 1的序列号进行匹配，并使用all()检查所有匹配项是否为TRUE。

> df
   Serial_number Irradiated Amplification Voltage
1      912009913          0      1.002520 24.9681
2      912009913          0      1.004520 29.9591
3      912009913          0      1.005370 34.9494
4      912009913          1      1.005600 44.9372
5      912009913          1      1.006830 49.9329
6      912009913          1      1.006900 54.9625
7      912009897          0      1.004537 26.4681
8      912009897          0      1.007240 28.9191
9      912009897          0      1.008167 29.4183
10     912009897          1      1.009153 33.1763
11     912009897          1      1.010291 36.1843
12     912009897          1      1.021757 41.4690

> all(unique(df$Serial_number[df$Irradiated == 0]) %in% unique(df$Serial_number[df$Irradiated == 1]))
[1] TRUE

然后，使用%in%在向量unique(df$Serial_number[df$Irradiated == 0])中提取匹配序列号的位置，并使用

获取这些值

> unique(df$Serial_number[df$Irradiated == 0])[unique(df$Serial_number[df$Irradiated == 0]) %in% unique(df$Serial_number[df$Irradiated == 1])]
[1] 912009913 912009897

，如果上面的TRUE检查all()，则应与unique(df$Serial_number)相同。

新方法。这是一个应该完成所有工作的功能：

FOO <- function(x, y){
  if(length(x) == length(y)){
    if(all(x %in% y)){
      print("All items matched.")
    }else{
      print(paste("Only in x: ", x[!x %in% y]))
      print(paste("Only in y: ", y[!y %in% x]))
    }
  }else{
    print(paste("Only in x: ", x[!x %in% y]))
    print(paste("Only in y: ", y[!y %in% x]))
  }
}

只需使用FOO(df$Serial_number[df$Irradiated == 0], df$Serial_number[df$Irradiated == 1])调用它，它会自动检查哪些数字只在两个向量中的一个中。

快速举例：

> FOO(c(1, 2), c(1, 2))
[1] "All items matched."

> FOO(c(1, 2), c(1, 2, 3))
[1] "Only in x:  "
[1] "Only in y:  3"

检查每个级别是否提供更多级别

1 个答案: