当两组一个接一个地重复时,我怎样才能找到每组中R组中的第一次和最后一次观察

时间:2015-01-29 11:29:34

标签: r

数据低于

dialled     Ringing     state   duration
NA  NA  NA  0
NA  NA  NA  0
NA  NA  NA  0
NA  NA  NA  0
123 NA  NA  0
123 NA  NA  0
123 NA  NA  0
123 NA  NA  60
NA  NA  active  0
NA  NA  active  0
NA  NA  inactive    0
NA  NA  inactive    0
123 NA  inactive    0
123 NA  inactive    0
123 NA  inactive    0
NA  NA  inactive  0
NA  NA  inactive  0
NA  NA  inactive    0
222 NA  inactive    0
222 NA  inactive    0
222 NA  inactive    37
NA  NA  active  0
NA  NA  active  0
NA  NA  inactive    0
123 NA  inactive    0
123 NA  inactive    0
123 NA  active  60
NA  NA  active  0
NA  NA  active  0
NA  NA  active  0
NA  NA  active  0
123 NA  inactive    0
123 NA  inactive    0
123 NA  inactive    0


answer i am looking for is

dialled     Ringing     state   duration
123 NA  NA          0
123 NA  NA          60
123 NA  inactive    0
123 NA  inactive    0
222 NA  inactive    0
222 NA  inactive    37
123 NA  inactive    0
123 NA  inactive    60
123 NA  inactive    0
123 NA  inactive    0

另外如果你可以帮助我获得紧接的下一行,在每个小组的最后一行之后和Rbind他们

2 个答案:

答案 0 :(得分:1)

data.table v1.9.5中,有一个新功能rleid()可帮助完成此任务相当简单。您可以按照these instructions

进行安装
require(data.table)
setDT(df)[, if (!is.na(dialled[1L])) .SD[c(1L, .N)], 
                by=.(dialled, rleid(dialled))]
#     dialled rleid Ringing    state duration
#  1:     123     2      NA       NA        0
#  2:     123     2      NA       NA       60
#  3:     123     4      NA inactive        0
#  4:     123     4      NA inactive        0
#  5:     222     6      NA inactive        0
#  6:     222     6      NA inactive       37
#  7:     123     8      NA inactive        0
#  8:     123     8      NA   active       60
#  9:     123    10      NA inactive        0
# 10:     123    10      NA inactive        0

.SD包含by =中指定的组的数据子集。

答案 1 :(得分:0)

你可以创建一个分组变量&#34; grp&#34; (类似于here)。子集&#34; df&#34;的行。那不是&#39; 0&#39;对于&#34; grp&#34;,使用slice获取每个&#34; grp&#34;,ungroup的第一行和最后一行,并删除grp变量。< / p>

rl <- rle(!is.na(df$dialled))
grp <- inverse.rle(within.list(rl, 
      values[values] <- cumsum(values)[values]))
df$grp <- grp
library(dplyr)
df %>%
    filter(grp!=0) %>% 
    group_by(grp) %>% 
    slice(c(1, n()))%>%
    ungroup() %>%
    select(-grp)
#       dialled Ringing    state duration
#1      123      NA       NA        0
#2      123      NA       NA       60
#3      123      NA inactive        0
#4      123      NA inactive        0
#5      222      NA inactive        0
#6      222      NA inactive       37
#7      123      NA inactive        0
#8      123      NA   active       60
#9      123      NA inactive        0
#10     123      NA inactive        0

或者base R选项是获取子集数据集的第一行和最后一行的行索引&#34; df1&#34;基于&#34; grp&#34;然后用它来提取行。

df1 <- df[grp!=0,]
df2 <- df1[unlist(tapply(1:nrow(df1), grp[grp!=0],
           FUN=function(x) c(head(x,1), tail(x,1)))),]

更新

评论中并不清楚。也许这有帮助

 df2 %>%
    group_by(grp) %>% 
    filter(any(duration>0)) %>% 
    slice(1)
 #    dialled Ringing    state duration grp
 #1     123      NA       NA        0   1
 #2     222      NA inactive        0   3
 #3     123      NA inactive        0   4

数据

df <- structure(list(dialled = c(NA, NA, NA, NA, 123L, 123L, 123L, 
123L, NA, NA, NA, NA, 123L, 123L, 123L, NA, NA, NA, 222L, 222L, 
222L, NA, NA, NA, 123L, 123L, 123L, NA, NA, NA, NA, 123L, 123L, 
123L), Ringing = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
 NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
 NA, NA, NA, NA, NA, NA, NA), state = c(NA, NA, NA, NA, NA, NA, 
 NA, NA, "active", "active", "inactive", "inactive", "inactive", 
 "inactive", "inactive", "inactive", "inactive", "inactive", "inactive", 
 "inactive", "inactive", "active", "active", "inactive", "inactive", 
 "inactive", "active", "active", "active", "active", "active", 
 "inactive", "inactive", "inactive"), duration = c(0L, 0L, 0L, 
 0L, 0L, 0L, 0L, 60L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
 0L, 0L, 37L, 0L, 0L, 0L, 0L, 0L, 60L, 0L, 0L, 0L, 0L, 0L, 0L, 
 0L)), .Names = c("dialled", "Ringing", "state", "duration"),
 class = "data.frame", row.names = c(NA, -34L))