基于字符串模式的组合从R中的数据帧中提取行

时间:2017-05-10 02:56:55

标签: r dataframe

我有一个这样的数据框:

> dat.ex
   readerID   caseID modalityID        score
1        -1 negCase1      truth  0.000000000
2        -1 negCase2      truth  0.000000000
3        -1 negCase3      truth  0.000000000
4        -1 negCase4      truth  0.000000000
5        -1 negCase5      truth  0.000000000
6        -1 posCase1      truth  1.000000000
7        -1 posCase2      truth  1.000000000
8        -1 posCase3      truth  1.000000000
9        -1 posCase4      truth  1.000000000
10       -1 posCase5      truth  1.000000000
11       -1 posCase6      truth  1.000000000
12  reader1 negCase1  modalityA  0.645770330
13  reader1 negCase2  modalityA -0.045435214
14  reader1 negCase3  modalityA -0.009321583
15  reader2 negCase1  modalityA  0.617116920
16  reader2 negCase2  modalityA -1.256761789
17  reader2 negCase3  modalityA  0.520738628
18  reader3 negCase4  modalityA  0.478473127
19  reader3 negCase5  modalityA  0.617643737
20  reader4 negCase4  modalityA  1.484608559
21  reader4 negCase5  modalityA  0.650401370
22  reader5 negCase4  modalityA  0.722780208
23  reader5 negCase5  modalityA -0.886408612
24  reader1 negCase1  modalityB  0.787467442
25  reader1 negCase2  modalityB -1.194042019
26  reader1 negCase3  modalityB  0.356205643
27  reader2 negCase1  modalityB  1.353382784
28  reader2 negCase2  modalityB -0.898399097
29  reader2 negCase3  modalityB  0.534502599
30  reader3 negCase4  modalityB  0.206777397
31  reader3 negCase5  modalityB  1.705076694
32  reader4 negCase4  modalityB -0.521876835
33  reader4 negCase5  modalityB  1.822607394
34  reader5 negCase4  modalityB -0.437264233
35  reader5 negCase5  modalityB  0.247428257
36  reader1 posCase1  modalityA  1.020725991
37  reader1 posCase2  modalityA  0.204586221
38  reader2 posCase1  modalityA  0.021037953
39  reader2 posCase2  modalityA  1.070082199
40  reader3 posCase3  modalityA  1.199915999
41  reader3 posCase4  modalityA  3.005755804
42  reader3 posCase5  modalityA  3.283965386
43  reader3 posCase6  modalityA  0.569245868
44  reader4 posCase3  modalityA  1.013806422
45  reader4 posCase4  modalityA  2.561740919
46  reader4 posCase5  modalityA  1.922134100
47  reader4 posCase6  modalityA -1.346340196
48  reader5 posCase3  modalityA -0.443638719
49  reader5 posCase4  modalityA  1.474838183
50  reader5 posCase5  modalityA  1.145236296
51  reader5 posCase6  modalityA -2.185551872
52  reader1 posCase1  modalityB  1.179792788
53  reader1 posCase2  modalityB  0.072058966
54  reader2 posCase1  modalityB  1.002802605
55  reader2 posCase2  modalityB -0.495913336
56  reader3 posCase3  modalityB -0.202530117
57  reader3 posCase4  modalityB  1.670346504
58  reader3 posCase5  modalityB  1.790677654
59  reader3 posCase6  modalityB  0.338419769
60  reader4 posCase3  modalityB  0.693741397
61  reader4 posCase4  modalityB  2.072539657
62  reader4 posCase5  modalityB -0.095006549
63  reader4 posCase6  modalityB -0.953573151
64  reader5 posCase3  modalityB  0.336378635
65  reader5 posCase4  modalityB  0.999420932
66  reader5 posCase5  modalityB  0.667314450
67  reader5 posCase6  modalityB -1.276895433

我想使用以下方法提取一些数据:

dat.stage1 <- dat.ex[(dat.ex$caseID==paste("negCase", 1:3, sep = "") |
dat.ex$caseID==paste("posCase", 1:2, sep ="")) &
(dat.ex$readerID==paste("reader", 1:2, sep = "") |
dat.ex$readerID=="-1"),]

但这不适用于以下消息:

  

警告讯息:   1:在is.na(e1)| is.na(e2):     较长的物体长度不是较短物体长度的倍数   2:在==。默认(dat.ex $ caseID,paste(“negCase”,1:3,sep =“”)):     较长的物体长度不是较短物体长度的倍数

我希望从代码中可以清楚地看到我想要的东西:带有(negCase1-3或posCase1-2)和(reader1-2或reader-1)的行。

1 个答案:

答案 0 :(得分:1)

我认为不是使用==,而是使用%in%,然后使用subset根据条件过滤数据框。

subset(dat.ex, caseID %in% c(paste0("negCase", 1:3), paste0("posCase", 1:2)) & 
               readerID %in% c(paste0("reader", 1:2), "-1"))


#   readerID   caseID modalityID        score
#1        -1 negCase1      truth  0.000000000
#2        -1 negCase2      truth  0.000000000
#3        -1 negCase3      truth  0.000000000
#6        -1 posCase1      truth  1.000000000
#7        -1 posCase2      truth  1.000000000
#12  reader1 negCase1  modalityA  0.645770330
#13  reader1 negCase2  modalityA -0.045435214
#14  reader1 negCase3  modalityA -0.009321583
#15  reader2 negCase1  modalityA  0.617116920
#16  reader2 negCase2  modalityA -1.256761789
#17  reader2 negCase3  modalityA  0.520738628
#24  reader1 negCase1  modalityB  0.787467442
#25  reader1 negCase2  modalityB -1.194042019
#26  reader1 negCase3  modalityB  0.356205643
#27  reader2 negCase1  modalityB  1.353382784
#28  reader2 negCase2  modalityB -0.898399097
#29  reader2 negCase3  modalityB  0.534502599
#36  reader1 posCase1  modalityA  1.020725991
#37  reader1 posCase2  modalityA  0.204586221
#38  reader2 posCase1  modalityA  0.021037953
#39  reader2 posCase2  modalityA  1.070082199
#52  reader1 posCase1  modalityB  1.179792788
#53  reader1 posCase2  modalityB  0.072058966
#54  reader2 posCase1  modalityB  1.002802605
#55  reader2 posCase2  modalityB -0.495913336