Question

entry="Where in the world is Carmen San Diego"
goal=["Where in the", "world is", "Carmen San Diego"]

我正在尝试创建一个程序来搜索＆＃34; entry＆＃34;中的单词块。他们是＆＃34;目标＆＃34;的成员名单。我想在这些子集中保留单词顺序。

这是我到目前为止所拥有的。我不确定如何完成此操作，或者我是否以正确的方式接近它。

span=1
words = entry.split(" ")
initial_list= [" ".join(words[i:i+span]) for i in range(0, len(words), span)]
x=len(initial_list)
initial_string= " ".join(initial_list)
def backtrack(A,k):
    if A in goal:
        print
    else:
        while A not in goal:
            k=k-1
            A= " ".join(initial_list[0:k])
            if A in goal:
                print A
                words=A.split(" ")
                firstmatch= [" ".join(words[i:i+span]) for i in range(0, len(words), span)]
                newList = []
                for item in initial_list:
                    if item not in firstmatch:
                        newList.append(item)
                nextchunk=" ".join(newList)             

backtrack(initial_string,x)

到目前为止的输出就是这样：

"Where in the"

期望的输出：

"Where in the"
"world is"
"Carmen San Diego"

我一直在试图找到一个合适的算法，我想它需要回溯或搜索修剪，我不太确定。理想情况下，解决方案适用于任何＆＃34;条目＆＃34;和＆＃34;目标＆＃34;名单。任何评论都非常感谢。

Answer 1

这是一个想法：将你的目标列表放入一个特里。在trie中找到当前条目字符串的最长匹配前缀，如果找到则将其添加到输出中。

然后找到当前条目字符串中的下一个空格（单词分隔符），将当前条目字符串设置为空格后索引中的子字符串，并重复直到它为空。

编辑：这是一些代码。

import string
import datrie

entry="Where in the world is Carmen San Diego"
goal=["Where in the", "world is", "Carmen San Diego"]

dt = datrie.BaseTrie(string.printable)
for i, s in enumerate(goal):
    dt[s] = i

def find_prefix(current_entry):
    try:
        return dt.longest_prefix(current_entry)
    except KeyError:
        return None

def find_matches(entry):
    current_entry = entry

    while(True):
        match = find_prefix(current_entry)
        if match:
            yield match
        space_index = current_entry.find(' ')
        if space_index > 0:
             current_entry = current_entry[space_index + 1:]
        else:
            return

print(list(find_matches(entry)))

Answer 2

这样做你想要的吗？

entry="Where in the world is Carmen San Diego"
goal=["Where in the", "world is", "Carmen San Diego"]


for word in goal:
    if word in entry:
        print(word)

它只搜索每个单词的条目，并在找到它时打印出来。

如果您想将它们保存到列表或其他内容中，您可以执行以下操作：

entry="Where in the world is Carmen San Diego"
goal=["Where in the", "world is", "Carmen San Diego"]
foundwords = []

for word in goal:
    if word in entry:
        foundwords.append(word)

回溯/搜索修剪 - Python中的组合词搜索

2 个答案: