从输入句子中找出所有可能的字符串组合

时间:2018-07-01 15:24:07

标签: r string split

为了从输入句子创建所有可能的字符串组合,我在下面做了代码行。

library(stringr)
text = c('I like you', 'I love you so much', 'she like it so much', 'she hate you', 'he hate you so much','I like him')
tex = data.frame(text)

library(splitstackshape)
pattern = data.frame(cSplit(tex, "text", " "))

n=ncol(pattern)

dat = c()
for(i in 1:n){
  tt = unique(pattern[,i])
  g=paste0(tt,collapse = ' ')
  dat = c(dat,g)
  SEQ = data.frame(dat)
}

SEQ = data.frame(cSplit(SEQ, "dat", " "))

它可以形成此数据框。

  dat_1 dat_2 dat_3
1     I   she    he
2  like  love  hate
3   you    it   him
4  <NA>    so  <NA>
5  <NA>  much  <NA>

我想要的是创建如下所示单词的所有可能组合(108)。

I like you so NA 
I like you so much 
I like you NA NA 
I like you NA much 
...
he love him so much 
he love him NA NA 
he love him NA much 
he hate you so NA 
he hate you so much 
...

我应该怎么做才能列出这些清单?

2 个答案:

答案 0 :(得分:2)

我认为data.table::tstrsplit便于拆分和转置。然后,选择每个列表元素(lapply(x, unique))的唯一值,并进行所有组合(expand.grid

expand.grid(lapply(data.table::tstrsplit(text, split = " "), unique))

 #       Var1 Var2 Var3 Var4 Var5
 #   1      I like  you <NA> <NA>
 #   2    she like  you <NA> <NA>
 #   3     he like  you <NA> <NA>
 #   4      I love  you <NA> <NA>
 #   5    she love  you <NA> <NA>
 #   [snip]
 #   104  she love  him   so much
 #   105   he love  him   so much
 #   106    I hate  him   so much
 #   107  she hate  him   so much
 #   108   he hate  him   so much

您也可以使用data.table的{​​{1}}等效项expand.grid,其参数为CJ

unique

答案 1 :(得分:2)

从“模式”数据集中,我们还可以使用 stage := (stage dependsOn buildAssistantJs).value 中的expand

tidyr

或者我们可以将library(tidyr) expand(pattern, !!! rlang::syms(names(pattern))) separate一起使用

expand