我的数据集如下
dialled Ringing state duration
NA NA NA 0
NA NA NA 0
NA NA NA 0
NA NA NA 0
123 NA NA 0
123 NA NA 0
123 NA NA 0
123 NA NA 60
NA NA active 0
NA NA active 0
NA NA inactive 0
NA NA inactive 0
NA 145 inactive 0
NA 145 inactive 0
NA 145 inactive 56
NA NA active 0
NA NA active 0
NA NA inactive 0
222 NA inactive 0
222 NA inactive 0
222 NA inactive 37
NA NA active 0
NA NA active 0
NA NA inactive 0
123 NA inactive 0
123 NA inactive 0
123 NA active 60
NA NA active 0
我想要获得第一名和最后一名。对于每个dialled
个数字(重复一个,因为每个呼叫都不同)。我正在寻找的答案是
dialled Ringing state duration
123 NA NA 0
123 NA NA 60
222 NA inactive 0
222 NA inactive 37
123 NA NA 0
123 NA NA 60
我使用以下
library(plyr)
ddply(DF, .(Dialled_nbr), function(x) x[c(1,nrow(x)), ]) which gave me
dialled Ringing state duration
123 NA NA 0
123 NA NA 60
222 NA inactive 0
222 NA inactive 37
但答案不正确。请帮忙
新数据
dialled Ringing state duration 123 NA NA 0 123 NA NA 0 123 NA NA 60 123 NA NA 0 123 NA NA 0 123 NA NA 70 222 NA inactive 0 222 NA inactive 0 222 NA inactive 37 123 NA inactive 0 123 NA inactive 0 123 NA active 60 Answer to be dialled Ringing state duration 123 NA NA 0 123 NA NA 60 123 NA NA 0 123 NA NA 70 222 NA inactive 0 222 NA inactive 37 123 NA inactive 0 123 NA active 60
答案 0 :(得分:3)
以下是data.table_1.9.5
的选项。使用setDT
从“data.frame”创建“data.table”,删除“已拨”列(NA
)中的!is.na(dialled)
值,使用{{1}生成分组变量在“Dialled_nbr”上,获取分组变量(rleid
)级别的第一行和最后一行的行索引,最后根据行索引对“dt1”进行子集化。
.I(c(1L, .N)]
或使用library(data.table)
dt1 <- setDT(df)[!is.na(dialled)]
dt1[dt1[,.I[c(1L, .N)],rleid(dialled)]$V1]
# dialled Ringing state duration
#1: 123 NA NA 0
#2: 123 NA NA 60
#3: 222 NA inactive 0
#4: 222 NA inactive 37
#5: 123 NA inactive 0
#6: 123 NA active 60
base R
基于新数据集,
df1 <- df[!is.na(df$dialled),]
grp<- inverse.rle(within.list(rle(df1$dialled),
values <- seq_along(values)))
df1[!duplicated(grp)|!duplicated(grp,fromLast=TRUE),]
# dialled Ringing state duration
#5 123 NA <NA> 0
#8 123 NA <NA> 60
#19 222 NA inactive 0
#21 222 NA inactive 37
#25 123 NA inactive 0
#27 123 NA active 60
grp <- cumsum(c(TRUE,df$duration[-nrow(df)]!=0))
df[!duplicated(grp)|!duplicated(grp,fromLast=TRUE),]
# dialled Ringing state duration
#1 123 NA <NA> 0
#3 123 NA <NA> 60
#4 123 NA <NA> 0
#6 123 NA <NA> 70
#7 222 NA inactive 0
#9 222 NA inactive 37
#10 123 NA inactive 0
#12 123 NA active 60
df <- structure(list(dialled = c(NA, NA, NA, NA, 123L, 123L, 123L,
123L, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 222L, 222L, 222L,
NA, NA, NA, 123L, 123L, 123L, NA), Ringing = c(NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, 145L, 145L, 145L, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), state = c(NA, NA, NA,
NA, NA, NA, NA, NA, "active", "active", "inactive", "inactive",
"inactive", "inactive", "inactive", "active", "active", "inactive",
"inactive", "inactive", "inactive", "active", "active", "inactive",
"inactive", "inactive", "active", "active"), duration = c(0L,
0L, 0L, 0L, 0L, 0L, 0L, 60L, 0L, 0L, 0L, 0L, 0L, 0L, 56L, 0L,
0L, 0L, 0L, 0L, 37L, 0L, 0L, 0L, 0L, 0L, 60L, 0L)), .Names =
c("dialled", "Ringing", "state", "duration"), class = "data.frame",
row.names = c(NA, -28L))
答案 1 :(得分:2)
以下是两个选项。首先,我们需要设置一些将在两个选项中使用的东西。
## remove rows where 'dialled' is NA
ndf <- DF[!is.na(DF$dialled),]
## run-length encoding on the 'dialled' column in 'ndf'
le <- rle(ndf$dialled)$lengths
选项1:创建一个行号的整数向量,用于子集。
ndf[cumsum(mapply(c, 1L, le-1L)), ]
# dialled Ringing state duration
# 5 123 NA <NA> 0
# 8 123 NA <NA> 60
# 19 222 NA inactive 0
# 21 222 NA inactive 37
# 25 123 NA inactive 0
# 27 123 NA active 60
如果您不想循环播放,则可以将mapply
来电替换为vec
,定义为
vec <- replace(integer(2*length(le))+1L, c(FALSE, TRUE), le-1L)
选项2:添加帮助id
列。然后使用dplyr
函数根据新的id列获取第一行和最后一行。
library(dplyr)
## updated data with new column
DF2 <- cbind(id = rep.int(seq_along(le), le), ndf)
## group by id and filter on the first and last rows
slice(group_by(DF2, id), c(1, n()))
# id dialled Ringing state duration
# 1 1 123 NA NA 0
# 2 1 123 NA NA 60
# 3 2 222 NA inactive 0
# 4 2 222 NA inactive 37
# 5 3 123 NA inactive 0
# 6 3 123 NA active 60
如果需要,您可以删除帮助列,但以后它也可以派上用场。