如何使用逗号将分隔的列转换为整齐的形式

时间:2017-05-04 00:33:03

标签: r dplyr tidyverse

我有以下几点:


df <- tibble::tribble(
  ~Sample_name, ~CRT,      ~SR,      ~`Bcells,DendriticCells,Macrophage`,
  "S1",          0.079,  0.592,      "0.077,0.483,0.555",
  "S2",          0.082,  0.549,      "0.075,0.268,0.120"
)

df
#> # A tibble: 2 × 4
#>   Sample_name   CRT    SR `Bcells,DendriticCells,Macrophage`
#>         <chr> <dbl> <dbl>                              <chr>
#> 1          S1 0.079 0.592                  0.077,0.483,0.555
#> 2          S2 0.082 0.549                  0.075,0.268,0.120

请注意,逗号分隔的第三列。如何将df转换为这种整洁的形式:

Sample_name CRT   SR       Score     Celltype
S1          0.079 0.592    0.077     Bcells 
S1          0.079 0.592    0.483     DendriticCells
S1          0.079 0.592    0.555     Macrophage
S2          0.082 0.549    0.075     Bcells
S2          0.082 0.549    0.268     DendriticCells
S2          0.082 0.549    0.120     Macrophage

1 个答案:

答案 0 :(得分:2)

我们可以使用separate

执行此操作
df %>%
    separate(col = `Bcells,DendriticCells,Macrophage`,
             into = strsplit('Bcells,DendriticCells,Macrophage', ',')[[1]],
             sep = ',') %>%
    gather(Celltype, score, Bcells:Macrophage)
# # A tibble: 6 × 5
#   Sample_name   CRT    SR       Celltype score
# <chr> <dbl> <dbl>          <chr> <chr>
# 1          S1 0.079 0.592         Bcells 0.077
# 2          S2 0.082 0.549         Bcells 0.075
# 3          S1 0.079 0.592 DendriticCells 0.483
# 4          S2 0.082 0.549 DendriticCells 0.268
# 5          S1 0.079 0.592     Macrophage 0.555
# 6          S2 0.082 0.549     Macrophage 0.120

没有硬编码:

cn <- colnames(df)[ncol(df)]
df %>%
    separate_(col = cn, into = strsplit(cn, ',')[[1]],  sep = ',') %>%
    gather_('Celltype', 'score', strsplit(cn, ',')[[1]])