我有一个随着时间推移接触表面的data.frame。我只想为每个AcvitivityID附加最后一行的副本:
head(movsdf.rbind)
ActivityID CareType HCWType Orientation Surface Date Time Dev.Date.Time SurfaceCategories
1 01 IV RN01 leftFacing AlcOutside 2019-08-03 11:08:01 2019-08-03 11:08:01 HygieneArea
2 01 IV RN01 leftFacing In 2019-08-03 11:08:12 2019-08-03 11:08:12 In
3 01 IV RN01 leftFacing Door 2019-08-03 11:08:12 2019-08-03 11:08:12 FarPatient
4 02 IV RN01 leftFacing Door 2019-08-03 11:08:18 2019-08-03 11:08:18 FarPatient
5 02 IV RN01 leftFacing Other 2019-08-03 11:08:22 2019-08-03 11:08:22 FarPatient
6 03 IV RN01 leftFacing Table 2019-08-03 11:10:26 2019-08-03 11:10:26 NearPatient
示例数据:
movsdf.rbind<-data.frame(ActivityID=rep(1:4, each=10),Surface=rep(c("In","Table","Out"),each=10))
所以我可以从here开始使用它:
repeatss <- aggregate(movsdf.rbind, by=list(movsdf.rbind$ActivityID), FUN = function(x) { last = tail(x,1) })
movsdf.rbind <-rbind(movsdf.rbind, repeatss)
这可以解决问题,但是看起来很笨拙,然后数据不整齐(不是真的很重要,但是我认为dplyr
或data.table
中可能存在一些更优雅的东西)。有什么想法吗?
答案 0 :(得分:7)
使用slice
的另一种选择:
library(dplyr)
DF %>%
group_by(ActivityID) %>%
slice(c(1:n(),n()))
给出:
# A tibble: 9 x 9 # Groups: ActivityID [3] ActivityID CareType HCWType Orientation Surface Date Time Dev.Date.Time SurfaceCategori~ <int> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> 1 1 IV RN01 leftFacing AlcOutside 2019-08-~ 11:08:01 2019-08-03 11:08~ HygieneArea 2 1 IV RN01 leftFacing In 2019-08-~ 11:08:12 2019-08-03 11:08~ In 3 1 IV RN01 leftFacing Door 2019-08-~ 11:08:12 2019-08-03 11:08~ FarPatient 4 1 IV RN01 leftFacing Door 2019-08-~ 11:08:12 2019-08-03 11:08~ FarPatient 5 2 IV RN01 leftFacing Door 2019-08-~ 11:08:18 2019-08-03 11:08~ FarPatient 6 2 IV RN01 leftFacing Other 2019-08-~ 11:08:22 2019-08-03 11:08~ FarPatient 7 2 IV RN01 leftFacing Other 2019-08-~ 11:08:22 2019-08-03 11:08~ FarPatient 8 3 IV RN01 leftFacing Table 2019-08-~ 11:10:26 2019-08-03 11:10~ NearPatient 9 3 IV RN01 leftFacing Table 2019-08-~ 11:10:26 2019-08-03 11:10~ NearPatient
两个基本的R替代方案:
# one
lastrows <- cumsum(aggregate(CareType ~ ActivityID, DF, length)[[2]])
DF[sort(c(seq(nrow(DF)), lastrows)),]
# two
idx <- unlist(tapply(1:nrow(DF), DF$ActivityID, FUN = function(x) c(x, tail(x, 1))))
DF[idx,]
两者给出相同的结果。
两个data.table替代方案:
library(data.table)
setDT(DF) # convert 'DF' to a data.table
# one
DF[DF[, .I[c(1:.N,.N)], by = ActivityID]$V1]
# two
DF[, .SD[c(1:.N,.N)], by = ActivityID]
使用的数据:
DF <- structure(list(ActivityID = c(1L, 1L, 1L, 2L, 2L, 3L),
CareType = c("IV", "IV", "IV", "IV", "IV", "IV"),
HCWType = c("RN01", "RN01", "RN01", "RN01", "RN01", "RN01"),
Orientation = c("leftFacing", "leftFacing", "leftFacing", "leftFacing", "leftFacing", "leftFacing"),
Surface = c("AlcOutside", "In", "Door", "Door", "Other", "Table"),
Date = c("2019-08-03", "2019-08-03", "2019-08-03", "2019-08-03", "2019-08-03", "2019-08-03"),
Time = c("11:08:01", "11:08:12", "11:08:12", "11:08:18", "11:08:22", "11:10:26"),
Dev.Date.Time = c("2019-08-03 11:08:01", "2019-08-03 11:08:12", "2019-08-03 11:08:12", "2019-08-03 11:08:18", "2019-08-03 11:08:22", "2019-08-03 11:10:26"),
SurfaceCategories = c("HygieneArea", "In", "FarPatient", "FarPatient", "FarPatient", "NearPatient")),
class = "data.frame", row.names = c(NA, -6L))
答案 1 :(得分:3)
一种dplyr
和tidyr
的可能性是(使用来自@Jaap的样本数据):
DF %>%
group_by(ActivityID) %>%
uncount((row_number() == max(row_number())) + 1)
ActivityID CareType HCWType Orientation Surface Date Time Dev.Date.Time SurfaceCategori…
<int> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 1 IV RN01 leftFacing AlcOutsi… 2019-08… 11:08… 2019-08-03 11:… HygieneArea
2 1 IV RN01 leftFacing In 2019-08… 11:08… 2019-08-03 11:… In
3 1 IV RN01 leftFacing Door 2019-08… 11:08… 2019-08-03 11:… FarPatient
4 1 IV RN01 leftFacing Door 2019-08… 11:08… 2019-08-03 11:… FarPatient
5 2 IV RN01 leftFacing Door 2019-08… 11:08… 2019-08-03 11:… FarPatient
6 2 IV RN01 leftFacing Other 2019-08… 11:08… 2019-08-03 11:… FarPatient
7 2 IV RN01 leftFacing Other 2019-08… 11:08… 2019-08-03 11:… FarPatient
8 3 IV RN01 leftFacing Table 2019-08… 11:10… 2019-08-03 11:… NearPatient
9 3 IV RN01 leftFacing Table 2019-08… 11:10… 2019-08-03 11:… NearPatient
或者:
DF %>%
group_by(ActivityID) %>%
uncount((row_number() == n()) + 1)
答案 2 :(得分:3)
如果我们只希望为每个组重复最后一行,则足以知道每个组的最后一行编号。我们可以将duplicated
的{{1}}参数设为fromLast
来获取这些行号,然后将其与当前行相加。使用@Jaap的数据
TRUE
答案 3 :(得分:2)
这是基本的R解决方案。
result <- lapply(split(movsdf.rbind, movsdf.rbind$ActivityID), function(DF){
rbind(DF, DF[nrow(DF), ])
})
result <- do.call(rbind, result)
result
# ActivityID value
#1.1 1 1
#1.2 1 2
#1.3 1 3
#1.31 1 3
#2.4 2 4
#2.5 2 5
#2.6 2 6
#2.61 2 6
#3.7 3 7
#3.8 3 8
#3.9 3 9
#3.91 3 9
如果新的行号很丑陋,则可以使用
使其连续。row.names(result) <- NULL
数据创建代码。
movsdf.rbind <- data.frame(ActivityID = rep(1:3, each = 3),
value = 1:9)
答案 4 :(得分:1)
我们可以static
然后split
至map
将每个数据帧的最后一行填充
bind_rows