从字符串列表中查找公共子字符串

时间:2015-12-23 07:39:29

标签: string list python-2.7

如何从字符串列表中仅取出字符串的前缀?需要注意的是,我手边不知道前缀。只有通过这个功能,我才会知道前缀。

<script src="//localhost:35729/livereload.js?snipver=1" type="text/javascript"></script>

如果列表的所有字符串中没有任何共同点,则它应该是一个空字符串。

3 个答案:

答案 0 :(得分:1)

这样做,

def get_large_subset(lis):
    k = max(lis, key=len) or lis[0]
    j = [k[:i] for i in range(len(k) + 1)]
    return [y for y in j if all(y in w for w in lis) ][-1]

>>> print get_large_subset(["test11", "test12", "test13"])
test1
>>> print get_large_subset(["test-a", "test-b", "test-c"])
test-
>>> print get_large_subset(["test1", "test1a", "test12"])
test1
>>> print get_large_subset(["testa-1", "testb-1", "testc-1"])
test

答案 1 :(得分:1)

解决方案

此功能有效:

def find_prefix(string_list):
    prefix = []
    for chars in zip(*string_list):
        if len(set(chars)) == 1:
            prefix.append(chars[0])
        else:
            break
    return ''.join(prefix)

测试

string_lists = [["test11", "test12", "test13"],
                ["test-a", "test-b", "test-c"],
                ["test1", "test1a", "test12"],
                ["testa-1", "testb-1", "testc-1"]]


for string_list in string_lists:
    print(string_list)
    print(find_prefix(string_list))

输出:

['test11', 'test12', 'test13']
test1
['test-a', 'test-b', 'test-c']
test-
['test1', 'test1a', 'test12']
test1
['testa-1', 'testb-1', 'testc-1']
test

速度

时间安排总是很有趣:

string_list = ["test11", "test12", "test13"]

%timeit get_large_subset(string_list)
100000 loops, best of 3: 14.3 µs per loop

%timeit find_prefix(string_list)
100000 loops, best of 3: 6.19 µs per loop

long_string_list = ['test{}'.format(x) for x in range(int(1e4))]

%timeit get_large_subset(long_string_list)
100 loops, best of 3: 7.44 ms per loop

%timeit find_prefix(long_string_list)
100 loops, best of 3: 2.38 ms per loop

very_long_string_list = ['test{}'.format(x) for x in range(int(1e6))]

%timeit get_large_subset(very_long_string_list)
1 loops, best of 3: 761 ms per loop

%timeit find_prefix(very_long_string_list)
1 loops, best of 3: 354 ms per loop

结论:以这种方式使用集合很快。

答案 2 :(得分:1)

单行(导入itertools as it):

''.join(x[0] for x in it.takewhile(lambda x: len(set(x)) == 1, zip(*string_list)))

加入string_list所有成员共有的所有首字母的列表。