我有以下代码:
data <- data_frame(job_id = c("114124", "114188", "114206"), project_skills = c("WordPress,XTCommerce,Magento,Prestashop,VirtueMart,osCommerce", "HTML,SEO,WordPress,SEO Texte", "Illustrator,Graphic Design,Photoshop"))
这将创建以下数据框:
job_id project_skills
114124 WordPress,XTCommerce,Magento,Prestashop,VirtueMart,osCommerce
114188 HTML,SEO,WordPress,SEO Texte
114206 Illustrator,Graphic Design,Photoshop
我需要按如下所示从project_skills列中拆分字符串(以逗号分隔):
job_id project_skills
114124 [WordPress] [XTCommerce] [Magento] [Prestashop] [VirtueMart] [osCommerce]
114188 [HTML] [SEO] [WordPress] [SEO Texte]
114206 [Illustrator] [Graphic Design] [Photoshop]
因此,我希望有一个数据框,其中包含分割短语作为行,应该是向量,以便我可以遍历它们。 有谁知道我如何建立这个?预先谢谢!!
答案 0 :(得分:1)
像这样吗?
l <- strsplit( data$project_skills, ",")
names(l) <- data$job_id
l
# $`114124`
# [1] "WordPress" "XTCommerce" "Magento" "Prestashop" "VirtueMart" "osCommerce"
#
# $`114188`
# [1] "HTML" "SEO" "WordPress" "SEO Texte"
#
# $`114206`
# [1] "Illustrator" "Graphic Design" "Photoshop"
使用data.table
library( data.table )
dt <- as.data.table( data )
#determine maximum number of skills
skillmax <- max( lengths( strsplit( dt$project_skills,",")))
#create data.table
dt[, paste0( "skill", 1:skillmax ) := tstrsplit( project_skills, ",", fill = NA)][]
# job_id project_skills skill1 skill2 skill3
# 1: 114124 WordPress,XTCommerce,Magento,Prestashop,VirtueMart,osCommerce WordPress XTCommerce Magento
# 2: 114188 HTML,SEO,WordPress,SEO Texte HTML SEO WordPress
# 3: 114206 Illustrator,Graphic Design,Photoshop Illustrator Graphic Design Photoshop
# skill4 skill5 skill6
# 1: Prestashop VirtueMart osCommerce
# 2: SEO Texte <NA> <NA>
# 3: <NA> <NA> <NA>