编辑:我在开始并希望产生的数据框中添加了更多细节,以更新此问题。
我正在使用R(主要是tidyverse函数)来重塑该数据集。现在,我需要从宽到长转换,以保留分类名称和文本输入字符串之间的关系。
是这样的:
Name Skill1 Skill1text Skill2 Skill2text Skill3 Skill3text Comm1 Comm2
Will Yes "SQL" No n/a Yes "Dishes" xyz zyx
Phil Yes "C++" Yes "Soup" No n/a 123 321
Jill No n/a Yes "Rice" Yes "Painting" abc cba
我希望它看起来像这样:
Name SkillName YesOrNo Text Comm1 Comm2
Will Skill1 Yes "SQL" xyz zyx
Phil Skill1 Yes "C++" 123 321
Jill Skill1 No n/a abc cba
Will Skill2 No n/a xyz zyx
Phil Skill2 Yes "Soup" 123 321
Jill Skill2 Yes "Rice" abc cba
Will Skill3 Yes "Dishes" xyz zyx
Phil Skill3 No n/a 123 321
Jill Skill3 Yes "Painting" abc cba
我已经完成了从宽到长的更简单的转换,但这让我很困惑。我想这些论坛上已经有一个简单的解决方案,但是我碰壁了,只需要寻求帮助!
答案 0 :(得分:0)
应该有更直接的方法,但这是一种方法,
library(dplyr)
library(tidyr)
bind_cols(df %>%
select(-contains('text')) %>%
gather(skillname, YesOrNo, -1),
df %>% select(contains('text')) %>%
gather(var, text) %>%
select(-var)
)
给出,
Name skillname YesOrNo text 1 Will Skill1 Yes SQL 2 Phil Skill1 Yes C++ 3 Jill Skill1 No n/a 4 Will Skill2 No n/a 5 Phil Skill2 Yes Soup 6 Jill Skill2 Yes Rice 7 Will Skill3 Yes Dishes 8 Phil Skill3 No n/a 9 Jill Skill3 Yes Painting
要满足您的新要求,
bind_cols(df %>%
select(-matches('text|comm')) %>%
gather(skillname, YesOrNo, -1),
df %>% select(contains('text')) %>%
gather(var, text) %>%
select(-var),
df %>%
select(contains('comm')) %>%
mutate_all(list(function(i) strsplit(toString(i), ', '))) %>%
unnest()
)
给出,
Name skillname YesOrNo text Comm1 Comm2 1 Will Skill1 Yes SQL xyz zyx 2 Phil Skill1 Yes C++ 123 321 3 Jill Skill1 No n/a abc cba 4 Will Skill2 No n/a xyz zyx 5 Phil Skill2 Yes Soup 123 321 6 Jill Skill2 Yes Rice abc cba 7 Will Skill3 Yes Dishes xyz zyx 8 Phil Skill3 No n/a 123 321 9 Jill Skill3 Yes Painting abc cba
答案 1 :(得分:0)
在 tidyr 1.0.0 中可以使用的功能tidyr::pivot_longer()
是这里的实现方法:
library(tidyverse)
#> Registered S3 method overwritten by 'rvest':
#> method from
#> read_xml.response xml2
#> Warning: package 'dplyr' was built under R version 3.6.1
df <- read.table(h=T, strin=F, text='Name Skill1 Skill1text Skill2 Skill2text Skill3 Skill3text Comm1 Comm2
Will Yes "SQL" No n/a Yes "Dishes" xyz zyx
Phil Yes "C++" Yes "Soup" No n/a 123 321
Jill No n/a Yes "Rice" Yes "Painting" abc cba')
df %>%
rename_at(vars(matches("^Skill\\d$")), paste0,"YesOrNo") %>%
pivot_longer(
-c(1,starts_with("Comm")),
names_to = c("Skill",".value"),
names_pattern = "(Skill\\d+)(.*)")
#> # A tibble: 9 x 6
#> Name Comm1 Comm2 Skill YesOrNo text
#> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 Will xyz zyx Skill1 Yes SQL
#> 2 Will xyz zyx Skill2 No n/a
#> 3 Will xyz zyx Skill3 Yes Dishes
#> 4 Phil 123 321 Skill1 Yes C++
#> 5 Phil 123 321 Skill2 Yes Soup
#> 6 Phil 123 321 Skill3 No n/a
#> 7 Jill abc cba Skill1 No n/a
#> 8 Jill abc cba Skill2 Yes Rice
#> 9 Jill abc cba Skill3 Yes Painting
由reprex package(v0.3.0)于2019-09-14创建
我们首先将Yes / No列重命名为具有一致的格式,然后使用.value
中的特殊值pivot_longer()
将技能移至其自己的列中