我有一个看起来像这样的数据框
ID Date Period Account Amount1 Amount2
<chr> <chr> <chr> <chr> <chr> <chr>
1 76311099 43494 /1 P / ABC / 123456 NA 3116362
2 NA NA NA C100ST NA NA
3 66112599 37135 /26 S / ADR NA 1246880.3900000001
4 NA NA NA 65101599 / S0 NA NA
5 45461599 37155 /O6 B / INR / REVERSE NA 623440.19000000006
6 NA NA NA UNDO / S0 NA NA
7 69876599 37134 /O3 N / ABC 401.63 NA
8 19991099 37122 /O5 P / PDA / ASK 4265 401.65 NA
9 NA NA NA AT045BT NA NA
我一直在努力做到这一点,但我尝试过的一切都没有奏效。基本上我想做的是,如果某行的ID
值为NA
,我想将Account
列中的文本附加到上面的行中,然后将其删除。>
我希望最终结果看起来像这样:
ID Date Period Account Amount1 Amount2
<chr> <chr> <chr> <chr> <chr> <chr>
1 76311099 43494 /1 P / ABC / 123456 / C100ST NA 3116362
2 66112599 37135 /26 S / ADR / 65101599 / S0 NA 1246880.3900000001
3 45461599 37155 /O6 B / INR / REVERSE / UNDO / S0 NA 623440.19000000006
4 69876599 37134 /O3 N / ABC 401.63 NA
5 19991099 37122 /O5 P / PDA / ASK 4265 / AT045BT 401.65 NA
如您所见,具有ID
值69876599
的行与下面没有其他行具有NA
的{{1}}值的行保持相同。 / p>
有人知道解决这个问题的方法吗?
答案 0 :(得分:6)
一个选择是fill
用选定的列更改前一个非NA元素的NA
,并按这些列分组,通过将元素串联为一个字符串来折叠“帐户” ,然后依次summarise
其余“金额”列中获取第一个非NA元素
library(tidyverse)
df1 %>%
fill(ID, Date, Period) %>%
group_by(ID, Date, Period) %>%
group_by(Account = str_c(Account, collapse = ' / '), add = TRUE) %>%
summarise_all(list(~ .[which(!is.na(.))[1]]))
# A tibble: 5 x 6
# Groups: ID, Date, Period [5]
# ID Date Period Account Amount1 Amount2
# <int> <int> <chr> <chr> <dbl> <dbl>
#1 19991099 37122 /O5 P / PDA / ASK 4265 / AT045BT 402. NA
#2 45461599 37155 /O6 B / INR / REVERSE / UNDO / S0 NA 623440.
#3 66112599 37135 /26 S / ADR / 65101599 / S0 NA 1246880.
#4 69876599 37134 /O3 N / ABC 402. NA
#5 76311099 43494 /1 P / ABC / 123456 / C100ST NA 3116362
df1 <- structure(list(ID = c(76311099L, NA, 66112599L, NA, 45461599L,
NA, 69876599L, 19991099L, NA), Date = c(43494L, NA, 37135L, NA,
37155L, NA, 37134L, 37122L, NA), Period = c("/1", NA, "/26",
NA, "/O6", NA, "/O3", "/O5", NA), Account = c("P / ABC / 123456",
"C100ST", "S / ADR", "65101599 / S0", "B / INR / REVERSE", "UNDO / S0",
"N / ABC", "P / PDA / ASK 4265", "AT045BT"), Amount1 = c(NA,
NA, NA, NA, NA, NA, 401.63, 401.65, NA), Amount2 = c(3116362,
NA, 1246880.39, NA, 623440.19, NA, NA, NA, NA)),
class = "data.frame", row.names = c("1",
"2", "3", "4", "5", "6", "7", "8", "9"))
答案 1 :(得分:0)
对于base-R解决方案...
让d =您的数据框,
i <- which(is.na(d[,1]))
d[i-1,"Account"] <- paste(d[i-1,"Account"],d[i,"Account"])
d <- d[-i,]