我有一个大数据集,必须将其满足条件的子集
我的数据集如下:
INDEX_KEY EID ID Sequence TYPE SEQ C_CODE DATE
1: AAAA11111111 103520312 33673 1 O 3 4950300 1991-08-26
2: AAAA11111111 103520312 33673 1 A 1 T923 1991-08-26
3: AAAA11111111 103520312 33673 1 E 2 Y86 1991-08-26
4: AAAA11111111 107146692 33673 2 O 1 4167100 1996-10-24
5: AAAA11111111 107146692 33673 2 O 2 4167400 1996-10-24
6: AAAA11111111 107146694 33673 3 B 1 J350 1996-10-24
7: BBBB22222222 215272083 44673 1 B 30 Z8643 2011-01-09
8: BBBB22222222 215272083 44673 1 B 20 B962 2011-01-09
9: BBBB22222222 346872083 44673 2 A 10 N12 2011-01-09
从此表中,我想基于“ EID”对数据表进行子集化,结果如下:
INDEX_KEY EID ID Sequence TYPE SEQ C_CODE DATE
1: AAAA11111111 103520312 33673 1 O 3 4950300 1991-08-26
2: AAAA11111111 103520312 33673 1 A 1 T923 1991-08-26
3: AAAA11111111 103520312 33673 1 E 2 Y86 1991-08-26
4: AAAA11111111 107146694 33673 3 B 1 J350 1996-10-24
5: BBBB22222222 215272083 44673 1 B 30 Z8643 2011-01-09
6: BBBB22222222 215272083 44673 1 B 20 B962 2011-01-09
为此,我使用“ for”循环来查找EID,如下所示:
all.ids <- c(unique(CODES$EID))
b.ids <- c(unique("103520312", "107146694", "215272083")
for (id in all.ids){
SUB_CODES <- if(id==b.ids){
CODES[id %in% CODES$EID]
} else {
}
}
此代码会导致以下多个警告消息:
Warning messages:
1: In if (id == b.ids) { ... :
the condition has length > 1 and only the first element will be used
...
我知道也有类似的问题,所以我尝试了这些问题,但是它们给了我更多我无法理解的警告。
谢谢!