为了从输入句子创建所有可能的字符串组合,我在下面做了代码行。
library(stringr)
text = c('I like you', 'I love you so much', 'she like it so much', 'she hate you', 'he hate you so much','I like him')
tex = data.frame(text)
library(splitstackshape)
pattern = data.frame(cSplit(tex, "text", " "))
n=ncol(pattern)
dat = c()
for(i in 1:n){
tt = unique(pattern[,i])
g=paste0(tt,collapse = ' ')
dat = c(dat,g)
SEQ = data.frame(dat)
}
SEQ = data.frame(cSplit(SEQ, "dat", " "))
它可以形成此数据框。
dat_1 dat_2 dat_3
1 I she he
2 like love hate
3 you it him
4 <NA> so <NA>
5 <NA> much <NA>
我想要的是创建如下所示单词的所有可能组合(108)。
I like you so NA
I like you so much
I like you NA NA
I like you NA much
...
he love him so much
he love him NA NA
he love him NA much
he hate you so NA
he hate you so much
...
我应该怎么做才能列出这些清单?
答案 0 :(得分:2)
我认为data.table::tstrsplit
便于拆分和转置。然后,选择每个列表元素(lapply(x, unique)
)的唯一值,并进行所有组合(expand.grid
)
expand.grid(lapply(data.table::tstrsplit(text, split = " "), unique))
# Var1 Var2 Var3 Var4 Var5
# 1 I like you <NA> <NA>
# 2 she like you <NA> <NA>
# 3 he like you <NA> <NA>
# 4 I love you <NA> <NA>
# 5 she love you <NA> <NA>
# [snip]
# 104 she love him so much
# 105 he love him so much
# 106 I hate him so much
# 107 she hate him so much
# 108 he hate him so much
您也可以使用data.table
的{{1}}等效项expand.grid
,其参数为CJ
。
unique
答案 1 :(得分:2)
从“模式”数据集中,我们还可以使用 stage := (stage dependsOn buildAssistantJs).value
中的expand
tidyr
或者我们可以将library(tidyr)
expand(pattern, !!! rlang::syms(names(pattern)))
与separate
一起使用
expand