我有一个6列10650行的数据框。在这些文件上,有值或NA,如以下示例所示:
Date X10 X20 X30 X40 X50 X60
2012-01-01 0.5 0.6 NA NA NA NA
2012-01-02 0.3 0.2 NA NA NA NA
2012-01-03 0.5 0.6 NA NA NA NA
2012-01-04 0.3 0.2 NA NA NA NA
2012-01-05 NA 0.6 0.4 NA NA NA
2012-01-06 NA 0.2 1.2 NA NA NA
2012-01-07 NA 0.6 1.6 NA NA NA
2012-01-08 NA NA 1.8 2.4 NA NA
2012-01-09 NA NA 2.1 3.2 NA NA
2012-01-10 NA NA 2.6 3.3 NA NA
2012-01-11 NA NA NA 3.7 5.1 NA
2012-01-12 NA NA NA 3.9 5.7 NA
2012-01-13 NA NA NA 4.2 5.6 NA
2012-01-14 NA NA NA NA 6.5 2.2
2012-01-15 NA NA NA NA 6.9 2.9
2012-01-16 NA NA NA NA 7.2 4.2
现在,我只想删除NA并创建4列,如下所示:
Date X1 X2 Xmin
2012-01-01 0.5 0.6 10
2012-01-02 0.3 0.2 10
2012-01-03 0.5 0.6 10
2012-01-04 0.3 0.2 10
2012-01-05 0.6 0.4 20
2012-01-06 0.2 1.2 20
2012-01-07 0.6 1.6 20
2012-01-08 1.8 2.4 30
2012-01-09 2.1 3.2 30
2012-01-10 2.6 3.3 30
2012-01-11 3.7 5.1 40
2012-01-12 3.9 5.7 40
2012-01-13 4.2 5.6 40
2012-01-14 6.5 2.2 50
2012-01-15 6.9 2.9 50
2012-01-16 7.2 4.2 50
我尝试使用stackoverflow中建议的帮助
> final[complete.cases(final), ]
> final <- na.omit(final)
他们都没有工作。
答案 0 :(得分:3)
我们可以使用apply
。用apply
(MARGIN
= 1)循环遍历数据集的子集行(没有'Date'列),然后删除NA
元素(na.omit
),创建data.frame
和这些元素,并且'Xmin'是第一个非NA元素的列名称,rbind
元素,cbind
是第一列
cbind(df1[1], do.call(rbind, apply(df1[-1], 1,
function(x) data.frame(setNames(as.list(na.omit(x)),
c("X1", "X2")), Xmin = sub("^X", "", names(na.omit(x)[1]))))))
df1 <- structure(list(Date = c("2012-01-01", "2012-01-02", "2012-01-03",
"2012-01-04", "2012-01-05", "2012-01-06", "2012-01-07", "2012-01-08",
"2012-01-09", "2012-01-10", "2012-01-11", "2012-01-12", "2012-01-13",
"2012-01-14", "2012-01-15", "2012-01-16"), X10 = c(0.5, 0.3,
0.5, 0.3, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), X20 = c(0.6,
0.2, 0.6, 0.2, 0.6, 0.2, 0.6, NA, NA, NA, NA, NA, NA, NA, NA,
NA), X30 = c(NA, NA, NA, NA, 0.4, 1.2, 1.6, 1.8, 2.1, 2.6, NA,
NA, NA, NA, NA, NA), X40 = c(NA, NA, NA, NA, NA, NA, NA, 2.4,
3.2, 3.3, 3.7, 3.9, 4.2, NA, NA, NA), X50 = c(NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, 5.1, 5.7, 5.6, 6.5, 6.9, 7.2), X60 = c(NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 2.2, 2.9, 4.2
)), class = "data.frame", row.names = c(NA, -16L))
答案 1 :(得分:1)
这是(基于)tidyverse的解决方案。它做的事情与akrun的答案相似,并且在性能方面也相当。唯一的区别可能是可读性,但这可能是一个偏好问题:
library(dplyr)
library(purrr)
df[2:ncol(df)] %>%
split(df$Date) %>%
map_dfr(function(x) {
cl <- na.omit(t(x))
Xmin <- rownames(cl)[1] %>% substr(., 2, nchar(.)) %>% as.numeric()
tibble(X1 = cl[1,], X2 = cl[2,], Xmin = Xmin)
}
) %>%
bind_cols(df["Date"], .)
########### OUTPUT ############
# A tibble: 16 x 4
Date X1 X2 Xmin
<date> <dbl> <dbl> <dbl>
1 2012-01-01 0.5 0.6 10
2 2012-01-02 0.3 0.2 10
3 2012-01-03 0.5 0.6 10
4 2012-01-04 0.3 0.2 10
5 2012-01-05 0.6 0.4 20
6 2012-01-06 0.2 1.2 20
7 2012-01-07 0.6 1.6 20
8 2012-01-08 1.8 2.4 30
9 2012-01-09 2.1 3.2 30
10 2012-01-10 2.6 3.3 30
11 2012-01-11 3.7 5.1 40
12 2012-01-12 3.9 5.7 40
13 2012-01-13 4.2 5.6 40
14 2012-01-14 6.5 2.2 50
15 2012-01-15 6.9 2.9 50
16 2012-01-16 7.2 4.2 50