我有一个看起来像这样的数据框
deltnr us stone_ny stone_mobility 1535 63 no_stone NA 1994-09-21 male 60 1536 61 no_stone NA 1983-09-06 male 60 1536 62 no_stone NA 1988-08-18 male 60 1536 63 stone mobile 1994-03-04 male 70 154 61 no_stone NA 1983-06-22 male 40 154 62 no_stone NA 1988-06-08 male 45 1543 61 no_stone NA 1983-08-17 female 30 1543 62 no_stone NA 1988-08-17 female 35 1336 61 no_stone NA 1983-08-22 male 60 1336 62 stone mobile 1988-11-04 male 65
我想在stone_ny变量中使用deltnr和“stone”提取所有独特的观察结果。我的问题在于每个deltnr都有多个观察结果。我尝试过unique()和subset()没有运气。
答案 0 :(得分:1)
您可以使用dplyr
library(dplyr)
dat %>%
group_by(deltnr) %>%
filter(stone_ny=="stone") %>% #assuming that there are no trailing or leading spaces
do(head(.,1))
给出输出
# deltnr us stone_ny stone_mobility date sex val
#1 1336 62 stone mobile 1988-11-04 male 65
#2 1536 63 stone mobile 1994-03-04 male 70
使用data.table
library(data.table)
unique(setDT(dat)[stone_ny == "stone"], by="deltnr") #updated after @Arun's comments
# deltnr us stone_ny stone_mobility date sex val
#1: 1536 63 stone mobile 1994-03-04 male 70
#2: 1336 62 stone mobile 1988-11-04 male 65
或者您可以使用base R
subset(subset(dat, stone_ny=="stone"),
ave(seq_along(us), deltnr, FUN=seq_along)==1)
# deltnr us stone_ny stone_mobility date sex val
#4 1536 63 stone mobile 1994-03-04 male 70
#10 1336 62 stone mobile 1988-11-04 male 65
我在您的数据中添加了一些列名
dat <- structure(list(deltnr = c(1535L, 1536L, 1536L, 1536L, 154L, 154L,
1543L, 1543L, 1336L, 1336L, 1336L), us = c(63L, 61L, 62L, 63L,
61L, 62L, 61L, 62L, 61L, 62L, 63L), stone_ny = c("no_stone",
"no_stone", "no_stone", "stone", "no_stone", "no_stone", "no_stone",
"no_stone", "no_stone", "stone", "stone"), stone_mobility = c(NA,
NA, NA, "mobile", NA, NA, NA, NA, NA, "mobile", "mobile"), date = c("1994-09-21",
"1983-09-06", "1988-08-18", "1994-03-04", "1983-06-22", "1988-06-08",
"1983-08-17", "1988-08-17", "1983-08-22", "1988-11-04", "1988-11-05"
), sex = c("male", "male", "male", "male", "male", "male", "female",
"female", "male", "male", "male"), val = c(60L, 60L, 60L, 70L,
40L, 45L, 30L, 35L, 60L, 65L, 66L)), .Names = c("deltnr", "us",
"stone_ny", "stone_mobility", "date", "sex", "val"), class = "data.frame", row.names = c(NA,
-11L))
答案 1 :(得分:0)
您首先要删除任何包含&#34; no_stone&#34;的行。在stone
列中使用类似
good_rows <- grepl("\\bstone\\b", df$stone_ny)
然后使用unique()
仅获取唯一条目
unique(df[good_rows, ])
# deltnr us stone_ny stone_mobility
# 4 1536 63 stone mobile
# 10 1336 62 stone mobile
请注意,我删除了最后三列,因为它们的原始帖子中没有名称。
答案 2 :(得分:0)
尝试:
> ddf2 = ddf[ddf$stone_ny=='stone',]
> ddf2[!duplicated(ddf2$deltnr),]
deltnr us stone_ny stone_mobility
4 1536 63 stone mobile
10 1336 62 stone mobile