我对一些tidyr行为感到困惑。我可以不用这样的单一回复:
library(tidyr)
resp1 <- c("A", "B; A", "B", NA, "B")
resp2 <- c("C; D; F", NA, "C; F", "D", "E")
resp3 <- c(NA, NA, "G; H; I", "H; I", "I")
data <- data.frame(resp1, resp2, resp3, stringsAsFactors = F)
tidy <- data %>%
transform(resp1 = strsplit(resp1, "; ")) %>%
unnest()
# Source: local data frame [6 x 3]
#
# resp2 resp3 resp1
# (chr) (chr) (chr)
# 1 C; D; F NA A
# 2 NA NA B
# 3 NA NA A
# 4 C; F G; H; I B
# 5 D H; I NA
# 6 E I B
但我需要在我的数据集中删除多个列,并且这些列具有不同数量的NA。我尝试了这个并且它引发了一个错误:
data %>%
transform(resp1 = strsplit(resp1, "; "),
resp2 = strsplit(resp2, "; "),
resp3 = strsplit(resp3, "; ")) %>%
unnest()
# Error: All nested columns must have the same number of elements.
我预计上面的代码会给出与以下相同的输出:
# unnesting multiple response (desired output / is there a better way?)
data %>%
transform(resp1 = strsplit(resp1, "; ")) %>%
unnest() %>%
transform(resp2 = strsplit(resp2, "; ")) %>%
unnest() %>%
transform(resp3 = strsplit(resp3, "; ")) %>%
unnest()
# resp1 resp2 resp3
# (chr) (chr) (chr)
# 1 A C NA
# 2 A D NA
# 3 A F NA
# 4 B NA NA
# 5 A NA NA
# 6 B C G
# 7 B C H
# 8 B C I
# 9 B F G
# 10 B F H
# 11 B F I
# 12 NA D H
# 13 NA D I
# 14 B E I
我是R的新手,但这让人觉得笨拙,让我想知道我是否在滥用我不应该滥用的东西。失败多次尝试失败后会发生什么?
答案 0 :(得分:1)
检查this link,其中显示了从您的列中删除多列的不同情况。根据文档和给出的链接,除非有一些聪明的方法,否则可能只为单个列定义函数以避免歧义。
所以你可能不得不一个一个地删除你的列,下面给出的代码可能仍然很麻烦但只是简化了一点。
> resp1 <- c("A", "B; A", "B", NA, "B")
> resp2 <- c("C; D; F", NA, "C; F", "D", "E")
> resp3 <- c(NA, NA, "G; H; I", "H; I", "I")
> data <- data.frame(resp1, resp2, resp3, stringsAsFactors = F)
> data
resp1 resp2 resp3
1 A C; D; F <NA>
2 B; A <NA> <NA>
3 B C; F G; H; I
4 <NA> D H; I
5 B E I
library(tidyr)
library(dplyr)
data %>%
transform(resp1 = strsplit(resp1, "; "),
resp2 = strsplit(resp2, "; "),
resp3 = strsplit(resp3, "; ")) %>%
unnest(resp1) %>% unnest(resp2) %>% unnest(resp3)
resp1 resp2 resp3
1 A C <NA>
2 A D <NA>
3 A F <NA>
4 B <NA> <NA>
5 A <NA> <NA>
6 B C G
7 B C H
8 B C I
9 B F G
10 B F H
11 B F I
12 <NA> D H
13 <NA> D I
14 B E I
答案 1 :(得分:0)
除了Psidom回答:默认情况下,unnest
会删除其他列表列(如果需要行重复)。
使用.drop = FALSE
参数保留其他列。
行unnest(resp1) %>% unnest(resp2) %>% unnest(resp3)
变为:
unnest(resp1, .drop = FALSE) %>% unnest(resp2, .drop = FALSE) %>% unnest(resp3)