有效地找到最长的匹配前缀字符串

时间:2018-01-15 08:47:49

标签: python string longest-substring

我目前的实施是:

def find_longest_matching_option(option, options):
    options = sorted(options, key=len)
    longest_matching_option = None
    for valid_option in options:
        # Don't want to treat "oreo" as matching "o",
        # match only if it's "o reo"
        if re.match(ur"^{}\s+".format(valid_option), option.strip()):
            longest_matching_option = valid_option
    return longest_matching_option

我正在尝试做的一些例子:

"foo bar baz something", ["foo", "foo bar", "foo bar baz"]
# -> "foo bar baz"
"foo bar bazsomething", (same as above)
# -> "foo bar"
"hello world", ["hello", "something_else"]
# -> "hello"
"a b", ["a", "a b"]
# -> "a b" # Doesn't work in current impl.

大多数情况下,我在这里寻找效率。目前的实施工作,但我被告知它是O(m^2 * n),这是非常糟糕的。

提前致谢!

2 个答案:

答案 0 :(得分:2)

让我们从foo开始。

def foo(x, y):
    x, y = x.strip(), y.strip()
    return x == y or x.startswith(y + " ")
如果两个字符串相等,则

foo返回true,或者一个(加一个空格)是另一个字符串的子字符串。

接下来,给定一个案例字符串和一个选项列表,您可以使用filter查找给定案例字符串的所有有效子字符串,然后应用max找到最长的字符串(参见foo)测试如下)。

以下是partial的一些测试用例。出于演示的目的,我将使用foofrom functools import partial cases = ["foo bar baz something", "foo bar bazsomething", "hello world", "a b", "a b"] options = [ ["foo", "foo bar", "foo bar baz"], ["foo", "foo bar", "foo bar baz"], ["hello", "something_else"], ["a", "a b"], ["a", "a b\t"] ] p_list = [partial(foo, c) for c in cases] for p, o in zip(p_list, options): print(max(filter(p, o), key=len)) 来更高阶函数。

foo bar baz
foo bar
hello
a b
a b

@POST
@Path("/deleteFile/{userId}/{fileName}")
@Produces(MediaType.APPLICATION_JSON)
@Consumes(MediaType.APPLICATION_JSON)
public boolean deleteFile(@PathParam("userId") Integer userId, @PathParam("fileName") String fileName);

答案 1 :(得分:1)

正则表达式在这里过度杀伤;你可以在比较每个字符串之前简单地为每个字符串添加一个空格,以获得相同的结果。

您也不需要对数据进行排序。简单地循环遍历每个值都会更有效。

def find_longest_matching_option(option, options):
    # append a space so that find_longest_matching_option("a b", ["a b"])
    # works as expected
    option += ' '
    longest = None

    for valid_option in options:
        # append a space to each option so that only complete
        # words are matched
        valid_option += ' '
        if option.startswith(valid_option):
            # remember the longest match
            if longest is None or len(longest) < len(valid_option):
                longest = valid_option

    if longest is not None:
        # remove the trailing space
        longest = longest[:-1]
    return longest