最近I asked a question here我希望在更大的列表ä¸æ‰¾åˆ°å列表。我有一个类似但略有ä¸åŒçš„问题。å‡è®¾æˆ‘有这个列表:
[['she', 'is', 'a', 'student'],
['she', 'is', 'a', 'lawer'],
['she', 'is', 'a', 'great', 'student'],
['i', 'am', 'a', 'teacher'],
['she', 'is', 'a', 'very', 'very', 'exceptionally', 'good', 'student']]
我希望使用matches = ['she', 'is', 'student']
进行查询,并打算从查询列表ä¸æŸ¥çœ‹åŒ…å«matches
å…ƒç´ çš„æ‰€æœ‰å列表。链接ä¸é—®é¢˜çš„唯一区别是我想å‘range
å‡½æ•°æ·»åŠ find_gappy
å‚æ•°ï¼Œå› æ¤å®ƒå°†é¿å…æ£€ç´¢å…ƒç´ ä¹‹é—´çš„é—´éš™è¶…å‡ºæŒ‡å®šèŒƒå›´çš„å列表。例如,在上é¢çš„例åä¸ï¼Œæˆ‘想è¦ä¸€ä¸ªåƒè¿™æ ·çš„函数:
matches = ['she', 'is', 'student']
x = [i for i in x if find_gappy(i, matches, range=2)]
会返回:
[['she', 'is', 'a', 'student'], ['she', 'is', 'a', 'great', 'student']]
最åŽä¸€ä¸ªå…ƒç´ æ²¡æœ‰æ˜¾ç¤ºï¼Œå› ä¸ºåœ¨she is a very very exceptionally good student
å¥ä¸ï¼Œa
å’Œgood
之间的è·ç¦»è¶…出了范围é™åˆ¶ã€‚
æˆ‘æ€Žæ ·æ‰èƒ½å†™å‡ºè¿™æ ·çš„功能?
之间的差è·ç”案 0 :(得分:2)
以下是将match
列表ä¸çš„项目顺åºè€ƒè™‘在内的一ç§æ–¹æ³•ï¼š
In [102]: def find_gappy(all_sets, matches, gap_range=2):
...: zip_m = list(zip(matches, matches[1:]))
...: for lst in all_sets:
...: indices = {j: i for i, j in enumerate(lst)}
...: try:
...: if all(0 <= indices[j]-indices[i] - 1 <= gap_range for i, j in zip_m):
...: yield lst
...: except KeyError:
...: pass
...:
...:
演示:
In [110]: lst = [['she', 'is', 'a', 'student'],
...: ['student', 'she', 'is', 'a', 'lawer'], # for order check
...: ['she', 'is', 'a', 'great', 'student'],
...: ['i', 'am', 'a', 'teacher'],
...: ['she', 'is', 'a', 'very', 'very', 'exceptionally', 'good', 'student']]
...:
In [111]:
In [111]: list(find_gappy(lst, ['she', 'is', 'student'], gap_range=2))
Out[111]: [['she', 'is', 'a', 'student'], ['she', 'is', 'a', 'great', 'student']]
如果您的å列表ä¸æœ‰é‡å¤çš„å—è¯ï¼Œæ‚¨å¯ä»¥ä½¿ç”¨defaultdict()
æ¥è·Ÿè¸ªæ‰€æœ‰ç´¢å¼•ï¼Œå¹¶ä½¿ç”¨itertools.prodcut
æ¥æ¯”较所有已订è´å—对的产å“çš„å·®è·ã€‚
In [9]: from collections import defaultdict
In [10]: from itertools import product
In [10]: def find_gappy(all_sets, matches, gap_range=2):
...: zip_m = list(zip(matches, matches[1:]))
...: for lst in all_sets:
...: indices = defaultdict(list)
...: for i, j in enumerate(lst):
...: indices[j].append(i)
...: try:
...: if all(any(0 <= v - k - 1 <= gap_range for k, v in product(indices[j], indices[i])) for i, j in zip_m):
...: yield lst
...: except KeyError:
...: pass
ç”案 1 :(得分:1)
链接问题ä¸çš„æŠ€æœ¯è¶³å¤Ÿå¥½ï¼Œä½ åªéœ€è¦åœ¨é€”ä¸æ·»åŠ é—´éš™ï¼Œå¹¶ä¸”ç”±äºŽä½ ä¸æƒ³è¦å…¨å±€è®¡æ•°ï¼Œæ‰€ä»¥æ¯å½“é‡åˆ°åŒ¹é…æ—¶é‡ç½®è®¡æ•°å™¨ã€‚类似的东西:
import collections
def find_gappy(source, matches, max_gap=-1):
matches = collections.deque(matches)
counter = max_gap # initialize as -1 if you want to begin counting AFTER the first match
for word in source:
if word == matches[0]:
counter = max_gap # or remove this for global gap counting
matches.popleft()
if not matches:
return True
else:
counter -= 1
if counter == -1:
return False
return False
data = [['she', 'is', 'a', 'student'],
['she', 'is', 'a', 'lawer'],
['she', 'is', 'a', 'great', 'student'],
['i', 'am', 'a', 'teacher'],
['she', 'is', 'a', 'very', 'very', 'exceptionally', 'good', 'student']]
matches = ['she', 'is', 'student']
x = [i for i in data if find_gappy(i, matches, 2)]
# [['she', 'is', 'a', 'student'], ['she', 'is', 'a', 'great', 'student']]
作为奖励,您å¯ä»¥å°†å…¶ç”¨ä½œåŽŸå§‹å‡½æ•°ï¼Œä»…当您将æ£æ•°ä¼ 递为max_gap
æ—¶æ‰åº”用间隙计数。