检查每个级别是否提供更多级别

时间:2018-02-21 11:24:22

标签: r dataframe

我有一个由大量Serial_numbers组成的数据框。在Serial_number之前和之后测量每个irradiation,由" 0"表示。和" 1"。

如果每个Serial_number确实提供了两个数据集,我想检查我的数据帧。这可能是一个简单的要求,但我还没有找到实际的解决方案,但是......

  Serial_number Irradiated Amplification Voltage

1 912009913 0 1.002520 24.9681
2 912009913 0 1.004520 29.9591
3 912009913 0 1.005370 34.9494
4 912009913 1 1.005600 44.9372
5 912009913 1 1.006830 49.9329
6 912009913 1 1.006900 54.9625

7 912009897 0 1.004537 26.4681
8 912009897 0 1.007240 28.9191
9 912009897 0 1.008167 29.4183
10 912009897 1 1.009153 33.1763
11 912009897 1 1.010291 36.1843
12 912009897 1 1.021757 41.4690

...

1 个答案:

答案 0 :(得分:1)

您可以将状态为Irratiadet == 0的唯一序列号与Irradiated == 1的序列号进行匹配,并使用all()检查所有匹配项是否为TRUE

> df
   Serial_number Irradiated Amplification Voltage
1      912009913          0      1.002520 24.9681
2      912009913          0      1.004520 29.9591
3      912009913          0      1.005370 34.9494
4      912009913          1      1.005600 44.9372
5      912009913          1      1.006830 49.9329
6      912009913          1      1.006900 54.9625
7      912009897          0      1.004537 26.4681
8      912009897          0      1.007240 28.9191
9      912009897          0      1.008167 29.4183
10     912009897          1      1.009153 33.1763
11     912009897          1      1.010291 36.1843
12     912009897          1      1.021757 41.4690

> all(unique(df$Serial_number[df$Irradiated == 0]) %in% unique(df$Serial_number[df$Irradiated == 1]))
[1] TRUE

然后,使用%in%在向量unique(df$Serial_number[df$Irradiated == 0])中提取匹配序列号的位置,并使用

获取这些值
> unique(df$Serial_number[df$Irradiated == 0])[unique(df$Serial_number[df$Irradiated == 0]) %in% unique(df$Serial_number[df$Irradiated == 1])]
[1] 912009913 912009897

,如果上面的TRUE检查all(),则应与unique(df$Serial_number)相同。

新方法。这是一个应该完成所有工作的功能:

FOO <- function(x, y){
  if(length(x) == length(y)){
    if(all(x %in% y)){
      print("All items matched.")
    }else{
      print(paste("Only in x: ", x[!x %in% y]))
      print(paste("Only in y: ", y[!y %in% x]))
    }
  }else{
    print(paste("Only in x: ", x[!x %in% y]))
    print(paste("Only in y: ", y[!y %in% x]))
  }
}

只需使用FOO(df$Serial_number[df$Irradiated == 0], df$Serial_number[df$Irradiated == 1])调用它,它会自动检查哪些数字只在两个向量中的一个中。

快速举例:

> FOO(c(1, 2), c(1, 2))
[1] "All items matched."

> FOO(c(1, 2), c(1, 2, 3))
[1] "Only in x:  "
[1] "Only in y:  3"