将同一数据框的列重塑为一个

时间:2019-06-07 07:45:53

标签: r dataframe reshape

我有一个df像这样:

Department   ID   Category   Category.ID
    NA       NA      NA          NA
   Sales     101     2           4
   Sales     101     2           4
    NA       NA      NA          NA
   Sales     101     2           4
   Sales     101     2           4
    NA       NA      NA          NA
   Sales     101     2           4
   Sales     101     2           4

df = data.frame(Department = rep(c(NA, 'Sales', 'Sales'), times = 3),
                ID = rep(c(NA, 101, 101), times = 3),
                Category.Department = rep(c(NA, 2, 2), times = 3),
                Category.ID = rep(c(NA, 4, 4), times = 3), stringsAsFactors = FALSE)

我想要一个这样的输出,其中只有一列可以有DepartmentID,而另一列可以是Category。每列中的NA非常重要,要分开各个组。

New.Col   Category
  NA         NA
 Sales       2
  101        4
  NA         NA
 Sales       2
  101        4
  NA         NA
 Sales       2
  101        4

到目前为止,我尝试使用transposesapplyfunction,但是它没有按预期工作。 base中有任何建议吗?

2 个答案:

答案 0 :(得分:1)

没有真实的预期输出就无法接受。

G

答案 1 :(得分:0)

与强制转换为长格式(使用coalesce)不同。另外,我创建了一个组变量并删除了NA行,因为它们在您的分析中无用,即

library(tidyverse)

df %>% 
 group_by(grp = cumsum(rowSums(is.na(.)) == ncol(.))) %>% 
 mutate_at(vars(contains('ID')), funs(lag)) %>% 
 mutate_at(vars(contains('Department')), funs(lead)) %>% 
 mutate(new.col = coalesce(Department, as.character(ID)), 
        category = coalesce(Category.Department, Category.ID)) %>% 
 select(grp, new.col, category) %>% 
 distinct()

给出,

# A tibble: 6 x 3
# Groups:   grp [3]
    grp new.col category
  <int> <chr>      <dbl>
1     1 Sales          2
2     1 101            4
3     2 Sales          2
4     2 101            4
5     3 Sales          2
6     3 101            4