使用dplyr折叠重复列

时间:2015-04-21 18:49:26

标签: r dplyr

我从电子表格中读取了以下数据。

structure(list(x = c("a", NA, NA, "b", NA, NA, "c", NA), y = c(1, 
   NA, NA, 7, NA, NA, 13, NA), z = c(2, NA, NA, 8, NA, NA, 14, NA
), x.1 = c(NA, "a", "a", NA, "b", "b", NA, "c"), y.1 = c(NA, 
3, 5, NA, 9, 11, NA, 15), z.1 = c(NA, 4, 6, NA, 10, 12, NA, 16
)), .Names = c("x", "y", "z", "x.1", "y.1", "z.1"), row.names = c(NA, 
-8L), class = "data.frame")

显示时看起来像这样:

     x  y  z  x.1 y.1 z.1
1    a  1  2 <NA>  NA  NA
2 <NA> NA NA    a   3   4
3 <NA> NA NA    a   5   6
4    b  7  8 <NA>  NA  NA
5 <NA> NA NA    b   9  10
6 <NA> NA NA    b  11  12
7    c 13 14 <NA>  NA  NA
8 <NA> NA NA    c  15  16

有时,这些组中有3个重复列不止一个。我怎么能将数据全部合并到3个第一列中,因为我不知道我将拥有多少块但是我知道列将以相同的方式命名,只是用不同的(但顺序递增)数字后缀?这可能与dplyr有关吗?

1 个答案:

答案 0 :(得分:2)

使用dplyr/tidyr

library(dplyr)
library(tidyr)
add_rownames(dfN) %>%
         gather(Var, Val, -1) %>% 
         mutate(Var=sub('\\..*$', '', Var)) %>%
         na.omit() %>% 
         spread(Var, Val) %>%
         select(-rowname) 
#  x  y  z
#1 a  1  2
#2 a  3  4
#3 a  5  6
#4 b  7  8
#5 b  9 10
#6 b 11 12
#7 c 13 14
#8 c 15 16

或使用base R

dfN[c('x', 'y', 'z')] <- lapply(split(colnames(dfN), sub('\\..*$', '', 
            colnames(dfN))), function(nm) 
                  do.call(pmax, c(dfN[nm], na.rm=TRUE)) )
dfN[1:3]

数据

dfN <- structure(list(x = c("a", NA, NA, "b", NA, NA, "c", NA),
y = c(1, 
 NA, NA, 7, NA, NA, 13, NA), z = c(2, NA, NA, 8, NA, NA, 14, NA
), x.1 = c(NA, "a", "a", NA, "b", "b", NA, "c"), y.1 = c(NA, 
 3, 5, NA, 9, 11, NA, 15), z.1 = c(NA, 4, 6, NA, 10, 12, NA, 16
)), .Names = c("x", "y", "z", "x.1", "y.1", "z.1"), row.names = c(NA, 
-8L), class = "data.frame")