Question

我的数据集为19000.唯一患者ID的长度为15000。我希望拥有这些唯一ID的子集，但与原始数据集中的其他变量一样

patnr      age    and 25 other variables
1          20
2          21
3          16
4           5
19000

我该怎么做？现在，我只能通过以下命令查看此数据库中有多少个唯一的患者ID：

length(unique(data$patnr))

Answer 1

让我们说你的data.frame被调用，df。您可以使用unique按如下方式选择出现的患者ID的第一个实例：

dfUnique <- df[unique(df$patn), ]

请注意，这将减少大约4,000行，如果第二次观察中同一患者的其他变量不同，则会丢失该信息。