我有一个像这样的角色矢量(vec):
[1] "super good dental associates" "cheap dentist in bel air md"
"dentures " "dentures "
"in office teeth whitening" "in office teeth whitening"
"dental gum surgery bel air, md"
[8] "dental implants" "dental implants"
"veneer teeth pictures"
我需要把它分成个别的话。我试过这个:
singleWords <- strsplit(vec, ' ')[[1]]
但是,我只得到该矢量的第一个元素的分割:
[1] "super" "good" "dental" "associates"
如何将所有单词的单个向量作为单个元素?
答案 0 :(得分:2)
你可以尝试:
strsplit(paste(vec, collapse = " "), ' ')[[1]]
答案 1 :(得分:2)
只是为了确认我的评论,并且因为你提到它不起作用,看一看。由于有几个元素有额外的空格,我建议使用\\s+
作为正则表达式来分割而不是我的注释中的单个空格。欢呼声。
> ( newVec <- unlist(sapply(vec, strsplit, "\\s+", USE.NAMES = FALSE)) )
# [1] "super" "good" "dental" "associates" "cheap" "dentist"
# [7] "in" "bel" "air" "md" "dentures" "dentures"
#[13] "in" "office" "teeth" "whitening" "in" "office"
#[19] "teeth" "whitening" "dental" "gum" "surgery" "bel"
#[25] "air," "md" "dental" "implants" "dental" "implants"
#[31] "veneer" "teeth" "pictures"
由于我在那里看到一个流浪逗号,所以通过调用gsub
来清除所有标点符号(如果有的话)可能是一个好主意
> gsub("[[:punct:]]", "", newVec)