我在这里搜索类似的问题,但找不到答案。请你能帮助我完成这项任务。我有一张表格,其中包含超过10,000名患者的大型病历数据集,我还有另外一张桌子,仅有689名患者。我想过滤大表只保留第二个表中与患者相关的记录。然后创建一个新表,将两个表分组,然后我最终得到三个表(两个过滤表和一个合并表)。
============================我现在拥有的东西=============== ==
表1(相关患者):
ID | PatientID | Record1 | Record2 | Record3
--------------------------------------------------------
1 | 7366 | 3 | 1 | 1
2 | 7362 | 3 | 1 | 1
3 | 7361 | 3 | 1 | 1
4 | 7360 | 3 | 1 | 1
5 | 7363 | 3 | 1 | 1
表2(所有患者):
ID | PatientID | Blood | SomeRecord | Foo
--------------------------------------------------------
1 | 7316 | 06668 | 21/08/2015 | 1
2 | 7302 | 08677 | 21/08/2015 | 3
3 | 7341 | 07787 | 21/08/2015 | 2
4 | 7340 | 08977 | 21/08/2015 | 1
5 | 7313 | 07887 | 21/08/2015 | 1
6 | 7366 | 56668 | 21/08/2015 | 1
7 | 7362 | 88677 | 21/08/2015 | 3
8 | 7361 | 77787 | 21/08/2015 | 2
9 | 7360 | 98977 | 21/08/2015 | 1
10 | 7363 | 87887 | 21/08/2015 | 1
我想根据表一患者ID过滤表2。该组将1和2分成一个新表。
============================ Desired Out Put ================ =====
表2(所有患者现已过滤):
ID | PatientID | Blood | SomeRecord | Foo
--------------------------------------------------------
6 | 7366 | 56668 | 21/08/2015 | 1
7 | 7362 | 88677 | 21/08/2015 | 3
8 | 7361 | 77787 | 21/08/2015 | 2
9 | 7360 | 98977 | 21/08/2015 | 1
10 | 7363 | 87887 | 21/08/2015 | 1
表3(所有患者现已过滤,所有记录分组):
ID |PatientID|Blood|SomeRecord|Foo|Record1|Record2|Record3
--------------------------------------------------------
6 | 7366 |56668|21/08/2015 |1 | 3 | 1 | 1
7 | 7362 |88677|21/08/2015 |3 | 3 | 1 | 1
8 | 7361 |77787|21/08/2015 |2 | 3 | 1 | 1
9 | 7360 |98977|21/08/2015 |1 | 3 | 1 | 1
10 | 7363 |87887|21/08/2015 |1 | 3 | 1 | 1
答案 0 :(得分:1)
只需在dplyr
加入两个:
library(dplyr)
semi_join(table2,table1, by=("PatientID"))
inner_join(table2,table1, by=("PatientID"))
<强>结果:强>
> semi_join(table2,table1, by=("PatientID"))
ID PatientID Blood SomeRecord Foo
1 6 7366 56668 21/08/2015 1
2 7 7362 88677 21/08/2015 3
3 8 7361 77787 21/08/2015 2
4 9 7360 98977 21/08/2015 1
5 10 7363 87887 21/08/2015 1
> inner_join(table2,table1, by=("PatientID"))
ID.x PatientID Blood SomeRecord Foo ID.y Record1 Record2 Record3
1 6 7366 56668 21/08/2015 1 1 3 1 1
2 7 7362 88677 21/08/2015 3 2 3 1 1
3 8 7361 77787 21/08/2015 2 3 3 1 1
4 9 7360 98977 21/08/2015 1 4 3 1 1
5 10 7363 87887 21/08/2015 1 5 3 1 1
数据强>
table1 <-read.table(text="ID PatientID Record1 Record2 Record3
1 7366 3 1 1
2 7362 3 1 1
3 7361 3 1 1
4 7360 3 1 1
5 7363 3 1 1",
header=T,stringsAsFactors =F)
table2 <-read.table(text=" ID PatientID Blood SomeRecord Foo
1 7316 06668 21/08/2015 1
2 7302 08677 21/08/2015 3
3 7341 07787 21/08/2015 2
4 7340 08977 21/08/2015 1
5 7313 07887 21/08/2015 1
6 7366 56668 21/08/2015 1
7 7362 88677 21/08/2015 3
8 7361 77787 21/08/2015 2
9 7360 98977 21/08/2015 1
10 7363 87887 21/08/2015 1",
header=T,stringsAsFactors =F)
答案 1 :(得分:0)
试试这个:
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/jquery.isotope/2.2.2/isotope.pkgd.min.js"></script>
<div class="filters">
<input type="checkbox" class="do_this_filter" value=".Hand-wash">Hand wash
<br>
<input type="checkbox" class="do_this_filter" value=".Machine-Wash">Machine Wash
<br>
</div>
<ul class='products'>
<li class="items Hand-wash">Demo product1</li>
<li class="items Machine-Wash">Demo product2</li>
</ul>
答案 2 :(得分:0)
以下是data.table
:
library(data.table)
setDT(table1) #convert each table _by reference_ to the data.table type
setDT(table2)
我实际上认为首先完成第二步更容易。
首先,反连接:
table3 <- table2[table1, on = "PatientID", nomatch = 0L]
我们可以将此视为一个子集,因为table1
位于i
;它同时是一个合并(使用on
证明),即我们将table1
和table2
合并为PatientID
,只保留匹配的行table1
(通过激活nomatch = 0
删除不匹配的行)
接下来,过滤table2
:
table2 <- table3[ ,names(table2), with = FALSE]
基本上,我们只是从table1
中移除table3
的所有列,以获得已过滤的table2
。
答案 3 :(得分:0)
1)没有软件包如果DF1和DF2是两个data.frames,则M
和M[1:5]
是必需的输出。如果不需要排序,则省略标记为##的行:
M <- merge(DF2, DF1[-1], by = "PatientID")
o <- order(M$ID) ##
M <- M[o, ] ##
,并提供:
> M[1:5]
PatientID ID Blood SomeRecord Foo
5 7366 6 56668 21/08/2015 1
3 7362 7 88677 21/08/2015 3
2 7361 8 77787 21/08/2015 2
1 7360 9 98977 21/08/2015 1
4 7363 10 87887 21/08/2015 1
> M
PatientID ID Blood SomeRecord Foo Record1 Record2 Record3
5 7366 6 56668 21/08/2015 1 3 1 1
3 7362 7 88677 21/08/2015 3 3 1 1
2 7361 8 77787 21/08/2015 2 3 1 1
1 7360 9 98977 21/08/2015 1 3 1 1
4 7363 10 87887 21/08/2015 1 3 1 1
2)sqldf
> library(sqldf)
> sqldf("select b.* from DF1 a join DF2 b using (PatientID)")
ID PatientID Blood SomeRecord Foo
1 6 7366 56668 21/08/2015 1
2 7 7362 88677 21/08/2015 3
3 8 7361 77787 21/08/2015 2
4 9 7360 98977 21/08/2015 1
5 10 7363 87887 21/08/2015 1
> sqldf("select b.*, a.* from DF1 a join DF2 b using (PatientID)")
ID PatientID Blood SomeRecord Foo ID PatientID Record1 Record2 Record3
1 6 7366 56668 21/08/2015 1 1 7366 3 1 1
2 7 7362 88677 21/08/2015 3 2 7362 3 1 1
3 8 7361 77787 21/08/2015 2 3 7361 3 1 1
4 9 7360 98977 21/08/2015 1 4 7360 3 1 1
5 10 7363 87887 21/08/2015 1 5 7363 3 1 1
注意:输入为:
Lines1 <- "ID | PatientID | Record1 | Record2 | Record3
1 | 7366 | 3 | 1 | 1
2 | 7362 | 3 | 1 | 1
3 | 7361 | 3 | 1 | 1
4 | 7360 | 3 | 1 | 1
5 | 7363 | 3 | 1 | 1"
Lines2 <- " ID | PatientID | Blood | SomeRecord | Foo
1 | 7316 | 06668 | 21/08/2015 | 1
2 | 7302 | 08677 | 21/08/2015 | 3
3 | 7341 | 07787 | 21/08/2015 | 2
4 | 7340 | 08977 | 21/08/2015 | 1
5 | 7313 | 07887 | 21/08/2015 | 1
6 | 7366 | 56668 | 21/08/2015 | 1
7 | 7362 | 88677 | 21/08/2015 | 3
8 | 7361 | 77787 | 21/08/2015 | 2
9 | 7360 | 98977 | 21/08/2015 | 1
10 | 7363 | 87887 | 21/08/2015 | 1"
DF1 <- read.table(text = Lines1, header = TRUE, sep = "|", strip.white = TRUE)
DF2 <- read.table(text = Lines2, header = TRUE, sep = "|", strip.white = TRUE)