从给定的单词集计数子串

时间:2014-11-30 01:16:42

标签: count substring

我有一组字符串(字典)和一个字符串T,我必须计算我可以从字典中的单词构建T的次数

例如

字典包含: hello world llo he

和字符串T" helloworld"

输出应为2因为" hellowold"可以从hello + world构建,他+ llo + world

有没有有效的算法呢?

2 个答案:

答案 0 :(得分:1)

这是python中的一个快速实现:

from collections import defaultdict

def count_constructions( words, string ):
    # First we're going to make a map with 
    # positions mapped to lists of words starting
    # at that position in the string
    words_at_index = defaultdict( list )
    for word in words:    
        i = string.find(word)
        while i >= 0:
            words_at_index[i].append(word)
            i = string.find(word, i + 1)
    # I know there's a more pythonic way to do this, 
    # but the point here is to be able to inc count within
    # the auxilliary function
    count = [ 0 ]

    # This will find all of the ways to cover the remaining string
    # starting at start
    def recurse( start ):
        for w in words_at_index[start]:
            # w matches from string[start] to string[next_start]
            next_start = start + len(w)
            # see if we've covered the whole thing.        
            if next_start == len(string):
                count[0] += 1
                # we could also emit the words forming the string here
            else: 
                # otherwise, count the times we can cover it from
                # next_start on
                recurse(next_start)

    recurse(0)
    return count[0]


dictionary = [ 'hello', 'world', 'llo', 'he' ]
word = "helloworld"

print( count_constructions( dictionary, word ) )

答案 1 :(得分:0)

我首先从你的词典中获取一个子集,其中只包含可能是你要搜索的单词的一部分。然后,使用其余的单词,您可以执行回溯实现,它不应该使用太多的资源,因为您将运行回溯的集合将非常小。