在一定范围内找到gappyå­åˆ—表

时间:2018-05-21 08:00:37

标签: python python-3.x list pattern-matching sublist

最近I asked a question here我希望在更大的列表中找到å­åˆ—表。我有一个类似但略有ä¸åŒçš„问题。å‡è®¾æˆ‘有这个列表:

 [['she', 'is', 'a', 'student'],
 ['she', 'is', 'a', 'lawer'],
 ['she', 'is', 'a', 'great', 'student'],
 ['i', 'am', 'a', 'teacher'],
 ['she', 'is', 'a', 'very', 'very', 'exceptionally', 'good', 'student']] 

我希望使用matches = ['she', 'is', 'student']进行查询,并打算从查询列表中查看包å«matches元素的所有å­åˆ—表。链接中问题的唯一区别是我想å‘range函数添加find_gappyå‚数,因此它将é¿å…检索元素之间的间隙超出指定范围的å­åˆ—表。例如,在上é¢çš„例å­ä¸­ï¼Œæˆ‘想è¦ä¸€ä¸ªåƒè¿™æ ·çš„函数:

matches = ['she', 'is', 'student']
x = [i for i in x if find_gappy(i, matches, range=2)]

会返回:

[['she', 'is', 'a', 'student'], ['she', 'is', 'a', 'great', 'student']]

最åŽä¸€ä¸ªå…ƒç´ æ²¡æœ‰æ˜¾ç¤ºï¼Œå› ä¸ºåœ¨she is a very very exceptionally good studentå¥ä¸­ï¼Œaå’Œgood之间的è·ç¦»è¶…出了范围é™åˆ¶ã€‚

我怎样æ‰èƒ½å†™å‡ºè¿™æ ·çš„功能?

之间的差è·

2 个答案:

答案 0 :(得分:2)

以下是将match列表中的项目顺åºè€ƒè™‘在内的一ç§æ–¹æ³•ï¼š

In [102]: def find_gappy(all_sets, matches, gap_range=2):
     ...:     zip_m = list(zip(matches, matches[1:]))
     ...:     for lst in all_sets:
     ...:         indices = {j: i for i, j in enumerate(lst)}
     ...:         try:
     ...:             if all(0 <= indices[j]-indices[i] - 1 <= gap_range for i, j in zip_m):
     ...:                 yield lst
     ...:         except KeyError:
     ...:             pass
     ...:         
     ...:   

演示:

In [110]: lst = [['she', 'is', 'a', 'student'],
     ...:  ['student', 'she', 'is', 'a', 'lawer'],  # for order check
     ...:  ['she', 'is', 'a', 'great', 'student'],
     ...:  ['i', 'am', 'a', 'teacher'],
     ...:  ['she', 'is', 'a', 'very', 'very', 'exceptionally', 'good', 'student']] 
     ...:  

In [111]: 

In [111]: list(find_gappy(lst, ['she', 'is', 'student'], gap_range=2))
Out[111]: [['she', 'is', 'a', 'student'], ['she', 'is', 'a', 'great', 'student']]

如果您的å­åˆ—表中有é‡å¤çš„å­—è¯ï¼Œæ‚¨å¯ä»¥ä½¿ç”¨defaultdict()æ¥è·Ÿè¸ªæ‰€æœ‰ç´¢å¼•ï¼Œå¹¶ä½¿ç”¨itertools.prodcutæ¥æ¯”较所有已订购字对的产å“çš„å·®è·ã€‚

In [9]: from collections import defaultdict
In [10]: from itertools import product

In [10]: def find_gappy(all_sets, matches, gap_range=2):
    ...:     zip_m = list(zip(matches, matches[1:]))
    ...:     for lst in all_sets:
    ...:         indices = defaultdict(list)
    ...:         for i, j in enumerate(lst):
    ...:             indices[j].append(i)
    ...:         try:
    ...:             if all(any(0 <= v - k - 1 <= gap_range for k, v in product(indices[j], indices[i])) for i, j in zip_m):
    ...:                 yield lst
    ...:         except KeyError:
    ...:             pass

答案 1 :(得分:1)

链接问题中的技术足够好,你åªéœ€è¦åœ¨é€”中添加间隙,并且由于你ä¸æƒ³è¦å…¨å±€è®¡æ•°ï¼Œæ‰€ä»¥æ¯å½“é‡åˆ°åŒ¹é…æ—¶é‡ç½®è®¡æ•°å™¨ã€‚类似的东西:

import collections

def find_gappy(source, matches, max_gap=-1):
    matches = collections.deque(matches)
    counter = max_gap  # initialize as -1 if you want to begin counting AFTER the first match
    for word in source:
        if word == matches[0]:
            counter = max_gap  # or remove this for global gap counting
            matches.popleft()
            if not matches:
                return True
        else:
            counter -= 1
            if counter == -1:
                return False
    return False

data = [['she', 'is', 'a', 'student'],
        ['she', 'is', 'a', 'lawer'],
        ['she', 'is', 'a', 'great', 'student'],
        ['i', 'am', 'a', 'teacher'],
        ['she', 'is', 'a', 'very', 'very', 'exceptionally', 'good', 'student']]

matches = ['she', 'is', 'student']
x = [i for i in data if find_gappy(i, matches, 2)]
# [['she', 'is', 'a', 'student'], ['she', 'is', 'a', 'great', 'student']]

作为奖励,您å¯ä»¥å°†å…¶ç”¨ä½œåŽŸå§‹å‡½æ•°ï¼Œä»…当您将正数传递为max_gapæ—¶æ‰åº”用间隙计数。