将字符向量分解为R中的单个词

时间:2014-04-03 17:55:06

标签: r vector character

我有一个像这样的角色矢量(vec):

[1] "super good dental associates"   "cheap dentist in bel air md"    
    "dentures   "                    "dentures   "                    
    "in office teeth whitening"      "in office teeth whitening"      
    "dental gum surgery bel air, md"
[8] "dental implants"                "dental implants"                
    "veneer teeth pictures"

我需要把它分成个别的话。我试过这个:

singleWords <- strsplit(vec, ' ')[[1]]

但是,我只得到该矢量的第一个元素的分割:

[1] "super"      "good"       "dental"     "associates"

如何将所有单词的单个向量作为单个元素?

2 个答案:

答案 0 :(得分:2)

你可以尝试:

strsplit(paste(vec, collapse = " "), ' ')[[1]]

答案 1 :(得分:2)

只是为了确认我的评论,并且因为你提到它不起作用,看一看。由于有几个元素有额外的空格,我建议使用\\s+作为正则表达式来分割而不是我的注释中的单个空格。欢呼声。

> ( newVec <- unlist(sapply(vec, strsplit, "\\s+", USE.NAMES = FALSE)) )
# [1] "super"      "good"       "dental"     "associates" "cheap"      "dentist"   
# [7] "in"         "bel"        "air"        "md"         "dentures"   "dentures"  
#[13] "in"         "office"     "teeth"      "whitening"  "in"         "office"    
#[19] "teeth"      "whitening"  "dental"     "gum"        "surgery"    "bel"       
#[25] "air,"       "md"         "dental"     "implants"   "dental"     "implants"  
#[31] "veneer"     "teeth"      "pictures" 

由于我在那里看到一个流浪逗号,所以通过调用gsub来清除所有标点符号(如果有的话)可能是一个好主意

> gsub("[[:punct:]]", "", newVec)