我从电子表格中读取了以下数据。
structure(list(x = c("a", NA, NA, "b", NA, NA, "c", NA), y = c(1,
NA, NA, 7, NA, NA, 13, NA), z = c(2, NA, NA, 8, NA, NA, 14, NA
), x.1 = c(NA, "a", "a", NA, "b", "b", NA, "c"), y.1 = c(NA,
3, 5, NA, 9, 11, NA, 15), z.1 = c(NA, 4, 6, NA, 10, 12, NA, 16
)), .Names = c("x", "y", "z", "x.1", "y.1", "z.1"), row.names = c(NA,
-8L), class = "data.frame")
显示时看起来像这样:
x y z x.1 y.1 z.1
1 a 1 2 <NA> NA NA
2 <NA> NA NA a 3 4
3 <NA> NA NA a 5 6
4 b 7 8 <NA> NA NA
5 <NA> NA NA b 9 10
6 <NA> NA NA b 11 12
7 c 13 14 <NA> NA NA
8 <NA> NA NA c 15 16
有时,这些组中有3个重复列不止一个。我怎么能将数据全部合并到3个第一列中,因为我不知道我将拥有多少块但是我知道列将以相同的方式命名,只是用不同的(但顺序递增)数字后缀?这可能与dplyr有关吗?
答案 0 :(得分:2)
使用dplyr/tidyr
library(dplyr)
library(tidyr)
add_rownames(dfN) %>%
gather(Var, Val, -1) %>%
mutate(Var=sub('\\..*$', '', Var)) %>%
na.omit() %>%
spread(Var, Val) %>%
select(-rowname)
# x y z
#1 a 1 2
#2 a 3 4
#3 a 5 6
#4 b 7 8
#5 b 9 10
#6 b 11 12
#7 c 13 14
#8 c 15 16
或使用base R
dfN[c('x', 'y', 'z')] <- lapply(split(colnames(dfN), sub('\\..*$', '',
colnames(dfN))), function(nm)
do.call(pmax, c(dfN[nm], na.rm=TRUE)) )
dfN[1:3]
dfN <- structure(list(x = c("a", NA, NA, "b", NA, NA, "c", NA),
y = c(1,
NA, NA, 7, NA, NA, 13, NA), z = c(2, NA, NA, 8, NA, NA, 14, NA
), x.1 = c(NA, "a", "a", NA, "b", "b", NA, "c"), y.1 = c(NA,
3, 5, NA, 9, 11, NA, 15), z.1 = c(NA, 4, 6, NA, 10, 12, NA, 16
)), .Names = c("x", "y", "z", "x.1", "y.1", "z.1"), row.names = c(NA,
-8L), class = "data.frame")