我有这个数据集:
# Data
movmnt_id <- c("101", "601", "105", "321")
plant <- c("FF", "FF", "DO", "BO")
loc <- c("MM", "MM", "KB", "RD")
vendor <- c(123, NA,NA,NA)
customer <- c(456, NA,NA,NA)
check <- c(NA, NA, "defined", "defined")
df <- data.frame(movmnt_id, plant, loc, vendor,customer,check)
movmnt_id plant loc vendor customer check
1 101 FF MM 123 456 <NA>
2 601 FF MM NA NA <NA>
3 105 DO KB NA NA defined
4 321 BO RD NA NA defined
我需要得到这个输出(在第二行中 vendor
& customer
是从第一行复制的):
movmnt_id plant loc vendor customer check
1 101 FF MM 123 456 <NA>
2 601 FF MM 123 456 <NA>
3 105 DO KB NA NA defined
4 321 BO RD NA NA defined
条件如下:
If in current row `movmnt_id `== 601
-> take row *WHERE* `plant` & `loc` are the same as in the current row
*AND* `movmnt_id == 101`
*AND* is.na(check)
-> copy from found row `vendor` & `customer` to the current row
我可以考虑一些 for 循环,但对于我的数据集来说它太重了。
我想知道是否有更优雅且计算成本更低的解决方案。
我试图从这些案例中调整解决方案,但没有成功:
答案 0 :(得分:2)
要实现您的条件,您可以尝试以下操作 -
library(dplyr)
df %>%
group_by(plant, loc) %>%
mutate(across(c(vendor, customer),
~ifelse(movmnt_id == '601' & is.na(.),
.[is.na(check) & movmnt_id == 101], .))) %>%
ungroup
# movmnt_id plant loc vendor customer check
# <chr> <chr> <chr> <dbl> <dbl> <chr>
#1 101 FF MM 123 456 NA
#2 601 FF MM 123 456 NA
#3 105 DO KB NA NA defined
#4 321 BO RD NA NA defined
答案 1 :(得分:1)
这个解决方案可能会有所帮助,但我假设这两个值被复制到第二行,因为它们对于 loc
和 plant
列共享相同的值:
library(dplyr)
df %>%
group_by(plant, loc) %>%
mutate(across(vendor:customer, ~ first(na.omit(.x))))
movmnt_id plant loc vendor customer check
<chr> <chr> <chr> <dbl> <dbl> <chr>
1 101 FF MM 123 456 NA
2 601 FF MM 123 456 NA
3 105 DO KB NA NA defined
4 321 BO RD NA NA defined