在文本中搜索序列

时间:2013-08-13 03:28:45

标签: python

我遇到了逻辑问题。

我有一个声明如下的字符串:

fruits = "banana grapes apple"
vegetables = "potatoes cucumber carrot"

现在有一些文字句子,我必须搜索文本格式前面的单词<vegetables> <fruits>

I ate carrot grapes ice cream for dessert.

答案:吃了

Dad and mom brought banana cucumber and milk.

答案:带来

我在想的是将句子拆分并将其放入数组中,然后查找序列,我能够打破句子但是匹配序列是一个问题。

wd = sentence.split(' ')
for x in wd.strip().split():
# now i will have to look for the sequence

现在,我将不得不寻找文本格式前面的文本

3 个答案:

答案 0 :(得分:2)

您在这里使用错误的数据结构,将水果和蔬菜转换为套装。然后问题很容易解决:

>>> fruits = set("banana grapes apple".split())
>>> vegetables = set("potatoes cucumber carrot".split())
>>> fruits_vegs = fruits | vegetables                  
>>> from string import punctuation
def solve(text):                                   
    spl = text.split()
    #use itertools.izip and iterators for memory efficiency.
    for x, y in zip(spl, spl[1:]): 
        #strip off punctuation marks
        x,y = x.translate(None, punctuation), y.translate(None, punctuation)
        if y in fruits_vegs and x not in fruits_vegs:
            return x
...         
>>> solve('I ate carrot grapes ice cream for dessert.')
'ate'
>>> solve('Dad and mom brought banana cucumber and milk.')
'brought'
>>> solve('banana cucumber and carrot.')
'and'

答案 1 :(得分:1)

fruits = "banana grapes apple".split(" ")
vegetables = "potatoes cucumber carrot".split(" ")

sentence = 'Dad and mom brought banana cucumber and milk.'

wd = sentence.split(' ')
for i, x in enumerate(wd):
    if (x in fruits or x in vegetables) and i > 0:
        print wd[i-1]
        break

答案 2 :(得分:1)

您可以使用正则表达式执行此操作:

def to_group(l):
    ''' make a regex group from a list of space-separated strings '''
    return '(?:%s)' % ('|'.join(l.split()))

pattern = r'(\w+) %s %s' % (to_group(vegetables), to_group(fruits))
print re.findall(pattern, string)