我有一个df
像这样:
Department ID Category Category.ID
NA NA NA NA
Sales 101 2 4
Sales 101 2 4
NA NA NA NA
Sales 101 2 4
Sales 101 2 4
NA NA NA NA
Sales 101 2 4
Sales 101 2 4
df = data.frame(Department = rep(c(NA, 'Sales', 'Sales'), times = 3),
ID = rep(c(NA, 101, 101), times = 3),
Category.Department = rep(c(NA, 2, 2), times = 3),
Category.ID = rep(c(NA, 4, 4), times = 3), stringsAsFactors = FALSE)
我想要一个这样的输出,其中只有一列可以有Department
和ID
,而另一列可以是Category
。每列中的NA
非常重要,要分开各个组。
New.Col Category
NA NA
Sales 2
101 4
NA NA
Sales 2
101 4
NA NA
Sales 2
101 4
到目前为止,我尝试使用transpose
,sapply
和function
,但是它没有按预期工作。 base
中有任何建议吗?
答案 0 :(得分:1)
没有真实的预期输出就无法接受。
G
答案 1 :(得分:0)
与强制转换为长格式(使用coalesce
)不同。另外,我创建了一个组变量并删除了NA
行,因为它们在您的分析中无用,即
library(tidyverse)
df %>%
group_by(grp = cumsum(rowSums(is.na(.)) == ncol(.))) %>%
mutate_at(vars(contains('ID')), funs(lag)) %>%
mutate_at(vars(contains('Department')), funs(lead)) %>%
mutate(new.col = coalesce(Department, as.character(ID)),
category = coalesce(Category.Department, Category.ID)) %>%
select(grp, new.col, category) %>%
distinct()
给出,
# A tibble: 6 x 3 # Groups: grp [3] grp new.col category <int> <chr> <dbl> 1 1 Sales 2 2 1 101 4 3 2 Sales 2 4 2 101 4 5 3 Sales 2 6 3 101 4