收集具有分类和相应字符串变量的函数

时间:2019-09-05 13:51:31

标签: r tidyr

编辑:我在开始并希望产生的数据框中添加了更多细节,以更新此问题。

我正在使用R(主要是tidyverse函数)来重塑该数据集。现在,我需要从宽到长转换,以保留分类名称和文本输入字符串之间的关系。

是这样的:

Name    Skill1 Skill1text   Skill2  Skill2text  Skill3  Skill3text Comm1 Comm2
Will    Yes    "SQL"        No      n/a         Yes     "Dishes"   xyz   zyx
Phil    Yes    "C++"        Yes     "Soup"      No      n/a        123   321
Jill    No     n/a          Yes     "Rice"      Yes     "Painting" abc   cba

我希望它看起来像这样:

Name    SkillName   YesOrNo   Text       Comm1    Comm2
Will    Skill1      Yes       "SQL"      xyz      zyx
Phil    Skill1      Yes       "C++"      123      321
Jill    Skill1      No        n/a        abc      cba
Will    Skill2      No        n/a        xyz      zyx
Phil    Skill2      Yes       "Soup"     123      321
Jill    Skill2      Yes       "Rice"     abc      cba
Will    Skill3      Yes       "Dishes"   xyz      zyx
Phil    Skill3      No        n/a        123      321
Jill    Skill3      Yes       "Painting" abc      cba

我已经完成了从宽到长的更简单的转换,但这让我很困惑。我想这些论坛上已经有一个简单的解决方案,但是我碰壁了,只需要寻求帮助!

2 个答案:

答案 0 :(得分:0)

应该有更直接的方法,但这是一种方法,

library(dplyr)
library(tidyr)

bind_cols(df %>% 
            select(-contains('text')) %>% 
            gather(skillname, YesOrNo, -1), 
         df %>% select(contains('text')) %>% 
            gather(var, text) %>% 
            select(-var)
          )

给出,

  Name skillname YesOrNo     text
1 Will    Skill1     Yes      SQL
2 Phil    Skill1     Yes      C++
3 Jill    Skill1      No      n/a
4 Will    Skill2      No      n/a
5 Phil    Skill2     Yes     Soup
6 Jill    Skill2     Yes     Rice
7 Will    Skill3     Yes   Dishes
8 Phil    Skill3      No      n/a
9 Jill    Skill3     Yes Painting

要满足您的新要求,

bind_cols(df %>%
            select(-matches('text|comm')) %>%
            gather(skillname, YesOrNo, -1),
         df %>% select(contains('text')) %>%
            gather(var, text) %>%
            select(-var),
         df %>%
          select(contains('comm')) %>%
          mutate_all(list(function(i) strsplit(toString(i), ', '))) %>%
          unnest()
          )

给出,

  Name skillname YesOrNo     text Comm1 Comm2
1 Will    Skill1     Yes      SQL   xyz   zyx
2 Phil    Skill1     Yes      C++   123   321
3 Jill    Skill1      No      n/a   abc   cba
4 Will    Skill2      No      n/a   xyz   zyx
5 Phil    Skill2     Yes     Soup   123   321
6 Jill    Skill2     Yes     Rice   abc   cba
7 Will    Skill3     Yes   Dishes   xyz   zyx
8 Phil    Skill3      No      n/a   123   321
9 Jill    Skill3     Yes Painting   abc   cba

答案 1 :(得分:0)

tidyr 1.0.0 中可以使用的功能tidyr::pivot_longer()是这里的实现方法:

library(tidyverse)
#> Registered S3 method overwritten by 'rvest':
#>   method            from
#>   read_xml.response xml2
#> Warning: package 'dplyr' was built under R version 3.6.1

df <- read.table(h=T, strin=F, text='Name    Skill1 Skill1text   Skill2  Skill2text  Skill3  Skill3text Comm1 Comm2
Will    Yes    "SQL"        No      n/a         Yes     "Dishes"   xyz   zyx
Phil    Yes    "C++"        Yes     "Soup"      No      n/a        123   321
Jill    No     n/a          Yes     "Rice"      Yes     "Painting" abc   cba')

df %>%
  rename_at(vars(matches("^Skill\\d$")), paste0,"YesOrNo") %>%
  pivot_longer(
    -c(1,starts_with("Comm")), 
    names_to = c("Skill",".value"), 
    names_pattern = "(Skill\\d+)(.*)")
#> # A tibble: 9 x 6
#>   Name  Comm1 Comm2 Skill  YesOrNo text    
#>   <chr> <chr> <chr> <chr>  <chr>   <chr>   
#> 1 Will  xyz   zyx   Skill1 Yes     SQL     
#> 2 Will  xyz   zyx   Skill2 No      n/a     
#> 3 Will  xyz   zyx   Skill3 Yes     Dishes  
#> 4 Phil  123   321   Skill1 Yes     C++     
#> 5 Phil  123   321   Skill2 Yes     Soup    
#> 6 Phil  123   321   Skill3 No      n/a     
#> 7 Jill  abc   cba   Skill1 No      n/a     
#> 8 Jill  abc   cba   Skill2 Yes     Rice    
#> 9 Jill  abc   cba   Skill3 Yes     Painting

reprex package(v0.3.0)于2019-09-14创建

我们首先将Yes / No列重命名为具有一致的格式,然后使用.value中的特殊值pivot_longer()将技能移至其自己的列中