我有一个由大量Serial_numbers
组成的数据框。在Serial_number
之前和之后测量每个irradiation
,由" 0"表示。和" 1"。
如果每个Serial_number确实提供了两个数据集,我想检查我的数据帧。这可能是一个简单的要求,但我还没有找到实际的解决方案,但是......
Serial_number Irradiated Amplification Voltage
1 912009913 0 1.002520 24.9681
2 912009913 0 1.004520 29.9591
3 912009913 0 1.005370 34.9494
4 912009913 1 1.005600 44.9372
5 912009913 1 1.006830 49.9329
6 912009913 1 1.006900 54.9625
7 912009897 0 1.004537 26.4681
8 912009897 0 1.007240 28.9191
9 912009897 0 1.008167 29.4183
10 912009897 1 1.009153 33.1763
11 912009897 1 1.010291 36.1843
12 912009897 1 1.021757 41.4690
...
答案 0 :(得分:1)
您可以将状态为Irratiadet == 0
的唯一序列号与Irradiated == 1
的序列号进行匹配,并使用all()
检查所有匹配项是否为TRUE
。
> df
Serial_number Irradiated Amplification Voltage
1 912009913 0 1.002520 24.9681
2 912009913 0 1.004520 29.9591
3 912009913 0 1.005370 34.9494
4 912009913 1 1.005600 44.9372
5 912009913 1 1.006830 49.9329
6 912009913 1 1.006900 54.9625
7 912009897 0 1.004537 26.4681
8 912009897 0 1.007240 28.9191
9 912009897 0 1.008167 29.4183
10 912009897 1 1.009153 33.1763
11 912009897 1 1.010291 36.1843
12 912009897 1 1.021757 41.4690
> all(unique(df$Serial_number[df$Irradiated == 0]) %in% unique(df$Serial_number[df$Irradiated == 1]))
[1] TRUE
然后,使用%in%
在向量unique(df$Serial_number[df$Irradiated == 0])
中提取匹配序列号的位置,并使用
> unique(df$Serial_number[df$Irradiated == 0])[unique(df$Serial_number[df$Irradiated == 0]) %in% unique(df$Serial_number[df$Irradiated == 1])]
[1] 912009913 912009897
,如果上面的TRUE
检查all()
,则应与unique(df$Serial_number)
相同。
新方法。这是一个应该完成所有工作的功能:
FOO <- function(x, y){
if(length(x) == length(y)){
if(all(x %in% y)){
print("All items matched.")
}else{
print(paste("Only in x: ", x[!x %in% y]))
print(paste("Only in y: ", y[!y %in% x]))
}
}else{
print(paste("Only in x: ", x[!x %in% y]))
print(paste("Only in y: ", y[!y %in% x]))
}
}
只需使用FOO(df$Serial_number[df$Irradiated == 0], df$Serial_number[df$Irradiated == 1])
调用它,它会自动检查哪些数字只在两个向量中的一个中。
快速举例:
> FOO(c(1, 2), c(1, 2))
[1] "All items matched."
> FOO(c(1, 2), c(1, 2, 3))
[1] "Only in x: "
[1] "Only in y: 3"