tidyr:具有不同NA计数的多次取消

时间:2016-04-23 21:05:53

标签: r tidyr

我对一些tidyr行为感到困惑。我可以不用这样的单一回复:

library(tidyr)

resp1 <- c("A", "B; A", "B", NA, "B")
resp2 <- c("C; D; F", NA, "C; F", "D", "E")
resp3 <- c(NA, NA, "G; H; I", "H; I", "I")
data <- data.frame(resp1, resp2, resp3, stringsAsFactors = F)

tidy <- data %>%
  transform(resp1 = strsplit(resp1, "; ")) %>%
  unnest()

# Source: local data frame [6 x 3]
#
#      resp2   resp3 resp1
#      (chr)   (chr) (chr)
# 1 C; D; F      NA     A
# 2      NA      NA     B
# 3      NA      NA     A
# 4    C; F G; H; I     B
# 5       D    H; I    NA
# 6       E       I     B

但我需要在我的数据集中删除多个列,并且这些列具有不同数量的NA。我尝试了这个并且它引发了一个错误:

data %>%
  transform(resp1 = strsplit(resp1, "; "),
            resp2 = strsplit(resp2, "; "),
            resp3 = strsplit(resp3, "; ")) %>%
  unnest()
# Error: All nested columns must have the same number of elements.

我预计上面的代码会给出与以下相同的输出:

# unnesting multiple response (desired output / is there a better way?)
data %>%
  transform(resp1 = strsplit(resp1, "; ")) %>%
  unnest() %>%
  transform(resp2 = strsplit(resp2, "; ")) %>%
  unnest() %>%
  transform(resp3 = strsplit(resp3, "; ")) %>%
  unnest()

#     resp1 resp2 resp3
#     (chr) (chr) (chr)
# 1      A     C    NA
# 2      A     D    NA
# 3      A     F    NA
# 4      B    NA    NA
# 5      A    NA    NA
# 6      B     C     G
# 7      B     C     H
# 8      B     C     I
# 9      B     F     G
# 10     B     F     H
# 11     B     F     I
# 12    NA     D     H
# 13    NA     D     I
# 14     B     E     I

我是R的新手,但这让人觉得笨拙,让我想知道我是否在滥用我不应该滥用的东西。失败多次尝试失败后会发生什么?

2 个答案:

答案 0 :(得分:1)

检查this link,其中显示了从您的列中删除多列的不同情况。根据文档和给出的链接,除非有一些聪明的方法,否则可能只为单个列定义函数以避免歧义。

所以你可能不得不一个一个地删除你的列,下面给出的代码可能仍然很麻烦但只是简化了一点。

> resp1 <- c("A", "B; A", "B", NA, "B")
> resp2 <- c("C; D; F", NA, "C; F", "D", "E")
> resp3 <- c(NA, NA, "G; H; I", "H; I", "I")
> data <- data.frame(resp1, resp2, resp3, stringsAsFactors = F)
> data
  resp1   resp2   resp3
1     A C; D; F    <NA>
2  B; A    <NA>    <NA>
3     B    C; F G; H; I
4  <NA>       D    H; I
5     B       E       I
library(tidyr)
library(dplyr)
data %>%
transform(resp1 = strsplit(resp1, "; "),
          resp2 = strsplit(resp2, "; "),
          resp3 = strsplit(resp3, "; ")) %>%
unnest(resp1) %>% unnest(resp2) %>% unnest(resp3)
   resp1 resp2 resp3
1      A     C  <NA>
2      A     D  <NA>
3      A     F  <NA>
4      B  <NA>  <NA>
5      A  <NA>  <NA>
6      B     C     G
7      B     C     H
8      B     C     I
9      B     F     G
10     B     F     H
11     B     F     I
12  <NA>     D     H
13  <NA>     D     I
14     B     E     I

答案 1 :(得分:0)

除了Psidom回答:默认情况下,unnest会删除其他列表列(如果需要行重复)。

使用.drop = FALSE参数保留其他列。

unnest(resp1) %>% unnest(resp2) %>% unnest(resp3)变为:

unnest(resp1, .drop = FALSE) %>% unnest(resp2, .drop = FALSE) %>% unnest(resp3)