什么R代码会将下面模拟数据框中每个人的叙述条目组合成一个变量?数据来自Excel电子表格,其中记录的叙述条目可以包含1到8行。每个计时员的记录以空行结束。
假设这个数据框,下面是dput():
> df
timekeeper narrative
1 Person A Review and revise insert for audit response
2 Invoice=2858502 letter regarding separate investigation; review
3 <NA> and exchange messages regarding same
4 <NA> <NA>
5 Person B Telephone conference with team; review e-mail
6 Invoice=2835951 correspondence from X regarding
7 <NA> credentialing issues; e-mail correspondence
8 <NA> with Y regarding same; review and
9 <NA> approve transmittal letter for incident reports
10 <NA> <NA>
11 Person C Telephone conference with X, Y
12 Invoice=2835951 et al., regarding notice of
13 <NA> <NA>
14 Person D Telephone conference with
15 Invoice=2835951 Brady, Gibson, et al., regarding DAB status;
16 <NA> telephone conference with X, et al.,
17 <NA> regarding physician investigation at 123 and
18 <NA> medical liability insurance; telephone
19 <NA> <NA>
20 Person B Conference with B regarding D
21 Invoice=2835951 <NA>
structure(list(timekeeper = c("Person A", "Invoice=2858502",
NA, NA, "Person B", "Invoice=2835951", NA, NA, NA, NA, "Person C",
"Invoice=2835951", NA, "Person D", "Invoice=2835951", NA, NA,
NA, NA, "Person B", "Invoice=2835951"), narrative = c("Review and revise insert for audit response",
"letter regarding separate investigation; review", "and exchange messages regarding same",
NA, "Telephone conference with team; review e-mail", "correspondence from X regarding",
"credentialing issues; e-mail correspondence", "with Y regarding same; review and",
"approve transmittal letter for incident reports", NA, "Telephone conference with X, Y",
"et al., regarding notice of", NA, "Telephone conference with",
"Brady, Gibson, et al., regarding DAB status;", "telephone conference with X, et al.,",
"regarding physician investigation at 123 and", "medical liability insurance; telephone",
NA, "Conference with B regarding D", NA)), .Names = c("timekeeper",
"narrative"), row.names = c(NA, -21L), class = "data.frame")
我想要的是这种格式:
timekeeper combined narrative
Person A Review and revise insert for audit response letter regarding separate investigation; review and exchange messages regarding same
一个可能的解决方案可能是在这个SO问题中,但我的空行和可变长度叙述的情况使我感到困惑。 multiple rows combined
答案 0 :(得分:4)
library(data.table)
library(zoo)
#step 1: convert all timekeeper matching the invoice pattern to NA
#step 2: using `na.locf` from zoo package, fill in NA in timekeeper with most recent non-NA value
#step 3: collpase non-NA narrative by timekeeper
setDT(df1)[,timekeeper:=na.locf(sub("(Invoice\\=\\d+)",NA,timekeeper))][,.(narrative=paste(narrative[!is.na(narrative)],collapse=" ")),by='timekeeper']
timekeeper
1: Person A
2: Person B
3: Person C
4: Person D
narrative
1: Review and revise insert for audit response letter regarding separate investigation; review and exchange messages regarding same
2: Telephone conference with team; review e-mail correspondence from X regarding credentialing issues; e-mail correspondence with Y regarding same; review and approve transmittal letter for incident reports Conference with B regarding D
3: Telephone conference with X, Y et al., regarding notice of
4: Telephone conference with Brady, Gibson, et al., regarding DAB status; telephone conference with X, et al., regarding physician investigation at 123 and medical liability insurance; telephone
答案 1 :(得分:1)
基础R方法:
indx <- grep('Person', df$timekeeper)
vec <- logical(nrow(df))
vec[indx] <- T
lst <- lapply(split(df$narrative, cumsum(vec)), paste, collapse= ' ')
names(lst) <- df$timekeeper[indx]
newdf <- as.data.frame(lst)
t(newdf)
# [,1]
#Person.A "Review and revise insert for audit response letter regarding #separate investigation; review and exchange messages regarding same NA"
#Person.B "Telephone conference with team; review e-mail correspondence from X #regarding cred