用定界符“:”分隔各列以及列号

时间:2019-09-30 18:31:18

标签: r

我有以下输入格式的巨大数据框。我试图基于定界符“:”分隔各列,并在第1列中输出值以及列号和行值。

input <- structure(list(V1 = structure(1:2, .Label = c("a1", "a2"), class = "factor"), 
    V2 = structure(1:2, .Label = c("aaa-1-c:bbb-1-d:ccc:a", "www-1-c"
    ), class = "factor"), V3 = structure(1:2, .Label = c("cc:nnn:ttt-cc", 
    "cdd:aaa:pp"), class = "factor"), V4 = structure(c(1L, NA
    ), .Label = "aaa-1-d", class = "factor")), class = "data.frame", row.names = c(NA, 
-2L))

我尝试过,但是列号和值的顺序不正确。

output <- structure(list(V1 = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
2L, 2L, 2L, 2L), .Label = c("a1", "a2 "), class = "factor"), 
    V2 = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 3L, 1L, 1L, 1L, 1L), V3 = structure(c(3L, 
    5L, 7L, 1L, 6L, 9L, 11L, 4L, 12L, 8L, 2L, 10L), .Label = c("a", 
    "aaa", "aaa-1-c", "aaa-1-d", "bbb-1-d", "cc", "ccc", "cdd", 
    "nnn", "pp", "ttt-cc", "www-1-c"), class = "factor")), class = "data.frame", row.names = c(NA, 
-12L))

任何人都可以帮忙。谢谢!

1 个答案:

答案 0 :(得分:2)

这里是一个选项,其中我们将数据集的形状从“宽”改成“长”(pivot_longer -1.0.0中的tidyr,然后拆分“ V3”列(在长格式):,并使用match

将“ V2”中的列名称更改为整数
library(dplyr)
library(tidyr)
input %>%
   pivot_longer(cols = -V1, names_to = "V2", values_to = "V3", 
          values_drop_na = TRUE) %>% 
   # older versions use gather
   # gather(V2, V3, -V1, na.rm = TRUE) %>%
   separate_rows(V3, sep=":") %>%
   group_by(V1) %>%
   mutate(V2 = match(V2, unique(V2))) %>%
   ungroup
# A tibble: 12 x 3
#   V1       V2 V3     
#   <fct> <int> <chr>  
# 1 a1        1 aaa-1-c
# 2 a1        1 bbb-1-d
# 3 a1        1 ccc    
# 4 a1        1 a      
# 5 a1        2 cc     
# 6 a1        2 nnn    
# 7 a1        2 ttt-cc 
# 8 a1        3 aaa-1-d
# 9 a2        1 www-1-c
#10 a2        2 cdd    
#11 a2        2 aaa    
#12 a2        2 pp