我有一个如下数据框:
如何删除na并使用下面的值上调?
谢谢
id name.america name.europe name.asia
1 a <NA> <NA>
2 <NA> b <NA>
3 <NA> <NA> c
4 d <NA> <NA>
更改为:
id name.america name.europe name.asia
1 a b c
2 d
答案 0 :(得分:3)
我们可以遍历各列并删除NA
,然后通过在获取{{之后,在末尾附加lengths
,使list
元素中的NA
相同max
元素的长度1}}。基于此,将数据集的“ id”列作为子集并附加输出
list
如果我们需要将lst <- lapply(df1[-1], na.omit)
lst1 <- lapply(lst, `length<-`, max(lengths(lst)))
out <- data.frame(lst1)
out1 <- cbind(id = df1$id[seq_len(nrow(out))], out)
out1
# id name.america name.europe name.asia
#1 1 a b c
#2 2 d <NA> <NA>
更改为空白(NA
)-不推荐
""
out1[is.na(out1)] <- ""
答案 1 :(得分:2)
基于tidyverse
的解决方案
require(tidyverse)
df1 %>%
gather(key = "name", value = "val", -id) %>%
na.omit() %>%
select(-id) %>%
group_by(name) %>%
mutate(id = 1:n()) %>%
spread(key = name, value = val)
# A tibble: 2 x 4
id name.america name.asia name.europe
<int> <chr> <chr> <chr>
1 1 a c b
2 2 d NA NA
select
或该变量对列进行重新排序。NAs
原样保留。如果需要,可以使用tidyr::replace_na
插入一些字符串或空格。我不鼓励你这样做。df1 <- structure(
list(
id = 1:4,
name.america = c("a", NA, NA, "d"),
name.europe = c(NA, "b", NA, NA),
name.asia = c(NA, NA, "c",
NA)
),
class = "data.frame",
row.names = c(NA, -4L)
)
答案 2 :(得分:0)
df1[, -1] <- lapply(df1[,-1], function(x) c(na.omit(x), rep("",length(x)-length(na.omit(x)))))
df1[1:max(colSums(!(df1[,-1]==""))),]
# id name.america name.europe name.asia
#1 1 a b c
#2 2 d