这与以下问题相关 - Searching for Unicode characters in Python
我有这样的字符串 -
sentence = 'AASFG BBBSDC FEKGG SDFGF'
我把它分开并获得如下的单词列表 -
sentence = ['AASFG', 'BBBSDC', 'FEKGG', 'SDFGF']
我使用以下代码搜索单词的一部分并获得整个单词 -
[word for word in sentence.split() if word.endswith("GG")]
返回['FEKGG']
现在我需要找出这个词背后的内容。
例如,当我搜索" GG"它返回['FEKGG']
。它也应该得到
behind = 'BBBSDC'
infront = 'SDFGF'
答案 0 :(得分:3)
如果您有以下字符串(从原始编辑):
sentence = 'AASFG BBBSDC FEKGG SDFGF KETGG'
def neighborhood(iterable):
iterator = iter(iterable)
prev = None
item = iterator.next() # throws StopIteration if empty.
for next in iterator:
yield (prev,item,next)
prev = item
item = next
yield (prev,item,None)
matches = [word for word in sentence.split() if word.endswith("GG")]
results = []
for prev, item, next in neighborhood(sentence.split()):
for match in matches:
if match == item:
results.append((prev, item, next))
返回:
[('BBBSDC', 'FEKGG', 'SDFGF'), ('SDFGF', 'KETGG', None)]
答案 1 :(得分:2)
这是一种可能性:
words = sentence.split()
[pos] = [i for (i, word) in enumerate(words) if word.endswith("GG") ]
behind = words[pos - 1]
infront = words[pos + 1]
您可能需要注意边缘情况,例如"…GG"
没有出现,出现多次,或者是第一个和/或最后一个字。就目前而言,任何这些都会引发异常,这可能是正确的行为。
使用正则表达式的完全不同的解决方案避免了首先将字符串拆分为数组:
match = re.search(r'\b(\w+)\s+(?:\w+GG)\s+(\w+)\b', sentence)
(behind, infront) = m.groups()
答案 2 :(得分:1)
这是一种方式。如果“G”字位于句子的开头或结尾,则前后元素将为None
。
words = sentence.split()
[(infront, word, behind) for (infront, word, behind) in
zip([None] + words[:-1], words, words[1:] + [None])
if word.endswith("GG")]
答案 3 :(得分:1)
sentence = 'AASFG BBBSDC FEKGG SDFGF AAABGG FOOO EEEGG'
def make_trigrams(l):
l = [None] + l + [None]
for i in range(len(l)-2):
yield (l[i], l[i+1], l[i+2])
for result in [t for t in make_trigrams(sentence.split()) if t[1].endswith('GG')]:
behind,match,infront = result
print 'Behind:', behind
print 'Match:', match
print 'Infront:', infront, '\n'
输出:
Behind: BBBSDC
Match: FEKGG
Infront: SDFGF
Behind: SDFGF
Match: AAABGG
Infront: FOOO
Behind: FOOO
Match: EEEGG
Infront: None
答案 4 :(得分:1)
另一个基于itertools的选项,对大型数据集可能更加内存友好
from itertools import tee, izip
def sentence_targets(sentence, endstring):
before, target, after = tee(sentence.split(), 3)
# offset the iterators....
target.next()
after.next()
after.next()
for trigram in izip(before, target, after):
if trigram[1].endswith(endstring): yield trigram
编辑:修正了拼写错误