后缀搜索 - Python

时间:2016-08-18 15:08:39

标签: python string algorithm search

这是一个问题,只要字符串列表和文档找到包含列表中所有字符串的最短子字符串。

因此,

FirebaseAuth

输出将是,

document = 'many google employees can program can google employees because google is a technology company that writes program'
    searchTerms = ['google', 'program', 'can']

这是我的方法, 将文档拆分为后缀树, 检查每个后缀中的所有字符串 返回最短的一个,

这是我的代码

can google employees because google is a technology company that writes program

这是一个在线提交,并没有通过一个测试用例。我不知道测试用例是什么。我的问题是,代码中是否存在逻辑错误。还有一种更有效的方法。

3 个答案:

答案 0 :(得分:2)

您可以将其分为两部分。首先,找到匹配某些属性的最短子字符串。我们假装我们已经有了一个测试该属性的函数:

def find_shortest_ss(document, some_property):
    # First level of looping gradually increases substring length
    for x in range(len(document)):
        # Second level of looping tests current length at valid positions
        for y in range(max(len(document), len(document)-x)):
            if some_property(document[y:x+y]):
                return document[y:x+y]
    # How to handle the case of no match is undefined
    raise ValueError('No matching value found')

现在我们要测试自己的属性:

def contains_all_terms(terms):
    return (lambda s: all(term in s for term in terms))

这个lambda表达式需要一些术语,并且会返回一个函数,当对字符串求值时,当且仅当所有项都在字符串中时才返回true。这基本上是嵌套函数定义的更简洁版本,您可以这样编写:

def contains_all_terms(terms):
    def string_contains_them(s):
        return all(term in s for term in terms)
    return string_contains_them

所以我们实际上只是返回我们在contains_all_terms函数内动态创建的函数的句柄

要将它们拼凑在一起我们确实如此:

>>> find_shortest_ss(document, contains_all_terms(searchTerms))
'program can google'

此代码具有一些效率优势:

  1. any内置函数具有短路评估功能,这意味着一旦找到不包含的子字符串,它就会返回False

  2. 首先检查所有最短的子串,然后一次增加一个额外字符长度的子串长度。如果它找到了令人满意的子字符串,它将退出并返回该值。因此,您可以保证返回的值永远不会超过必要的时间。它甚至不会对子字符串进行任何超过必要的操作。

  3. 8行代码,我认为还不错

答案 1 :(得分:0)

蛮力是 rootRef.child("users").observeEventType(.Value, withBlock: {(snap) in if let userDict = snap.value as? [String:AnyObject]{ for each in userDict as [String: AnyObject] { let autoID = each.0 //Here you retrieve your autoID rootRef.child("users").child(autoID).child("player1").observeEventType(.Value, withBlock: {(playersDict) in if let playerDictionary = playerDict.value as? [String:AnyObject]{ let emailID = playerDictionary["email"] as! String //print(emailID) } }) } } }) ,为什么不呢:

O(n³)

但你可以更快地完成这项工作。例如,任何相关的子字符串只能以其中一个关键字结尾

答案 2 :(得分:0)

而不是强制所有可能的子字符串,我粗暴强迫所有可能匹配的字位...它应该更快一点..

import numpy as np
from itertools import product


document = 'many google employees can program can google employees because google is a technology company that writes program'
searchTerms = ['google', 'program']

word_lists = []

for word in searchTerms: 
    word_positions = []
    start = 0  #starting index of str.find()
    while 1:
        start = document.find(word, start, -1)
        if start == -1:  #no more instances
            break
        word_positions.append([start, start+len(word)])  #beginning and ending index of search term
        start += 1  #increment starting search postion
    word_lists.append(word_positions)  #add all search term positions to list of all search terms

minLen = len(document)
lower = 0
upper = len(document)
for p in product(*word_lists):  #unpack word_lists into word_positions
    indexes = np.array(p).flatten()  #take all indices into flat list
    lowerI = np.min(indexes)
    upperI = np.max(indexes)
    indexRange = upperI - lowerI  #determine length of substring
    if indexRange < minLen: 
        minLen = indexRange
        lower = lowerI
        upper = upperI

print document[lower:upper]