根据观察值提取数据

时间:2014-09-26 12:06:39

标签: r unique subset

我有一个看起来像这样的数据框

deltnr  us      stone_ny        stone_mobility
1535    63  no_stone    NA          1994-09-21  male    60
1536    61  no_stone    NA          1983-09-06  male    60  
1536    62  no_stone    NA          1988-08-18  male    60
1536    63  stone       mobile      1994-03-04  male    70
154     61  no_stone    NA          1983-06-22  male    40
154     62  no_stone    NA          1988-06-08  male    45
1543    61  no_stone    NA          1983-08-17  female  30
1543    62  no_stone    NA          1988-08-17  female  35
1336    61  no_stone    NA          1983-08-22  male    60
1336    62  stone       mobile      1988-11-04  male    65

我想在stone_ny变量中使用deltnr和“stone”提取所有独特的观察结果。我的问题在于每个deltnr都有多个观察结果。我尝试过unique()和subset()没有运气。

3 个答案:

答案 0 :(得分:1)

您可以使用dplyr

    library(dplyr)
    dat %>% 
        group_by(deltnr) %>%
        filter(stone_ny=="stone") %>% #assuming that there are no trailing or leading spaces
        do(head(.,1))

给出输出

     # deltnr us stone_ny stone_mobility       date  sex val
    #1   1336 62    stone         mobile 1988-11-04 male  65
    #2   1536 63    stone         mobile 1994-03-04 male  70

使用data.table

     library(data.table)
      unique(setDT(dat)[stone_ny == "stone"], by="deltnr") #updated after @Arun's comments
      #   deltnr us stone_ny stone_mobility       date  sex val
      #1:   1536 63    stone         mobile 1994-03-04 male  70
      #2:   1336 62    stone         mobile 1988-11-04 male  65

或者您可以使用base R

    subset(subset(dat, stone_ny=="stone"), 
             ave(seq_along(us), deltnr, FUN=seq_along)==1)
     #    deltnr us stone_ny stone_mobility       date  sex val
    #4    1536 63    stone         mobile 1994-03-04 male  70
    #10   1336 62    stone         mobile 1988-11-04 male  65

数据

我在您的数据中添加了一些列名

 dat <- structure(list(deltnr = c(1535L, 1536L, 1536L, 1536L, 154L, 154L, 
 1543L, 1543L, 1336L, 1336L, 1336L), us = c(63L, 61L, 62L, 63L, 
 61L, 62L, 61L, 62L, 61L, 62L, 63L), stone_ny = c("no_stone", 
 "no_stone", "no_stone", "stone", "no_stone", "no_stone", "no_stone", 
 "no_stone", "no_stone", "stone", "stone"), stone_mobility = c(NA, 
 NA, NA, "mobile", NA, NA, NA, NA, NA, "mobile", "mobile"), date = c("1994-09-21", 
 "1983-09-06", "1988-08-18", "1994-03-04", "1983-06-22", "1988-06-08", 
 "1983-08-17", "1988-08-17", "1983-08-22", "1988-11-04", "1988-11-05"
 ), sex = c("male", "male", "male", "male", "male", "male", "female", 
 "female", "male", "male", "male"), val = c(60L, 60L, 60L, 70L, 
 40L, 45L, 30L, 35L, 60L, 65L, 66L)), .Names = c("deltnr", "us", 
 "stone_ny", "stone_mobility", "date", "sex", "val"), class = "data.frame", row.names = c(NA, 
 -11L))

答案 1 :(得分:0)

您首先要删除任何包含&#34; no_stone&#34;的行。在stone列中使用类似

的内容
good_rows <- grepl("\\bstone\\b", df$stone_ny)

然后使用unique()仅获取唯一条目

unique(df[good_rows, ])
#    deltnr us stone_ny stone_mobility
# 4    1536 63    stone         mobile
# 10   1336 62    stone         mobile

请注意,我删除了最后三列,因为它们的原始帖子中没有名称。

答案 2 :(得分:0)

尝试:

> ddf2 = ddf[ddf$stone_ny=='stone',]
> ddf2[!duplicated(ddf2$deltnr),]
   deltnr us stone_ny stone_mobility      
4    1536 63    stone         mobile 
10   1336 62    stone         mobile