Question

我在思考如何使用位于前方的线索词来选择最佳候选者。这应该在字符串集中运行，我试图多次实现，但是无法做到。

基本概念是有这样的字符串＆＃39; ----- clueword（candiate1 | candidate2 ...）----＆＃39;我想要的功能可以根据数据选择最有希望的候选者。

clue =  c( 'a',    'to',    'a',   'to',   'to')
word = c('house','school','paper','water','schooling')
cooccur = c(100,    90,      83,    70,     61)
data = data.frame(clue,word,cooccur)

假设有两个字符串集

S1 = 'I have a (house|water|paper) and car'
S2 = 'I need to go to (school|schooling) right now'

线索词＆＃39; a＆＃39;与“房子”共同发生的频率很高，而且“与房子的关系很频繁”。和学校一起做＃39;因此，使用THE函数，结果应该是

S1
[1] 'I have a (house) and car'
S2
[2] 'I need to go to (school) right now'

您不必担心处理删除不太有希望的候选人，因为此代码会处理此问题。

library(gsubfn)
gsubfn("\\(([^)]+)", ~paste0("(", paste(THEFUNCTION(unlist(x)), collapse="|")), S1)

我知道我可以使用which.max()但是使用它与“线索”相关联一点也不容易。有什么方法可以让我通过这个吗？

Answer 1

这有效：

THEFUNCTION <- function(x) { # dummy function, to be replaced by the one that selects w.r.t. co-occurence frequency
  # this function receives inputs without paranthesis: e.g., 'house|water|paper'
  ifelse(grepl('house', x), 'house', 'school')
}

S1 = 'I have a (house|water|paper) and car'
S2 = 'I need to go to (school|schooling) right now'

library(gsubfn)
gsubfn("\\(([^\\)]+)\\)", ~paste0("(", paste(THEFUNCTION(unlist(x)), collapse="|"), ")"), S1)
#[1] "I have a (house) and car"
gsubfn("\\(([^\\)]+)\\)", ~paste0("(", paste(THEFUNCTION(unlist(x)), collapse="|"), ")"), S2)
#[1] "I need to go to (school) right now"

如何使用前面的线索词选择字符串中的最佳候选者

1 个答案: