如何用字母分割矢量中的某个元素?

时间:2014-04-27 16:09:49

标签: r vector split

例如,我在向量中有一个元素“computer”。我需要得到一个由“c”,“o”,“m”,“p”,“u”,“t”,“e”,“r”组成的向量。

我的问题的第二部分是可选的。如何创建一个包含上述向量元素的字母组合的向量,结果组合中的字母将只按原始单词中的顺序排列?例如,我希望在此向量中使用“puter”或“mpu”而不是“tumpo”。

3 个答案:

答案 0 :(得分:3)

您可以使用

strsplit("computer", "\\b")

library("RWeka")
gsub(" ", "", 
     NGramTokenizer(paste(strsplit("computer", "\\b")[[1]], collapse=" "), 
                    Weka_control(min=2, 
                                 max=5)),
     fixed=TRUE)  
# [1] "compu" "omput" "mpute" "puter" "comp" 
# [6] "ompu"  "mput"  "pute"  "uter"  "com"  
# [11] "omp"   "mpu"   "put"   "ute"   "ter"  
# [16] "co"     "om"    "mp"    "pu"    "ut"   
# [21] "te"    "er"   

以2< = n< = 5创建n-gram。

答案 1 :(得分:1)

问题的第一部分很容易获得:

splits <- unlist(strsplit("computer",split=""))

> splits
[1] "c" "o" "m" "p" "u" "t" "e" "r"

对于第二部分,您可以使用以下代码:

subseqs <- 
  unlist(
    lapply(1:length(splits),FUN=function(x){
      lapply(1:(length(splits)+1-x),FUN=function(y){ 
        paste(splits[y:(y+x-1)],collapse="") })
    })
  )
> subseqs
 [1] "c"        "o"        "m"        "p"        "u"        "t"        "e"       
 [8] "r"        "co"       "om"       "mp"       "pu"       "ut"       "te"      
[15] "er"       "com"      "omp"      "mpu"      "put"      "ute"      "ter"     
[22] "comp"     "ompu"     "mput"     "pute"     "uter"     "compu"    "omput"   
[29] "mpute"    "puter"    "comput"   "ompute"   "mputer"   "compute"  "omputer" 
[36] "computer"

答案 2 :(得分:0)

连续三个字母组合:

x <- strsplit("computer", "\\b")
y <- combn(seq(x),3); m <- match(1:6,y[1,])
combn (x,3)[,m]

enter image description here