使用正则表达式来约束元组列表

时间:2017-04-13 14:18:16

标签: python regex list tuples

给出一个单词元组列表及其句子中的词性:

[('We', 'PRP'),
 ('took', 'VBD'),
 ('advantage', 'NN'),
 ('of', 'IN'),
 ('the', 'DT'),
 ('half', 'JJ'),
 ('price', 'NN'),
 ('sushi', 'NN'),
 ('deal', 'NN'),
 ('on', 'IN'),
 ('saturday', 'NN')]

我想使用正则表达式提取具有某些PoS序列的术语。这类似于('JJ')*('NN')+,因此我有一个[('advantage', 'half price sushi deal', 'saturday')]列表。执行这样一项任务最合适的方式是什么,请记住,我会为数百个句子做这件事?

谢谢!

1 个答案:

答案 0 :(得分:1)

I think this might be something that will do the trick:

a = [('We', 'PRP'),
 ('took', 'VBD'),
 ('advantage', 'NN'),
 ('of', 'IN'),
 ('the', 'DT'),
 ('half', 'JJ'),
 ('price', 'NN'),
 ('sushi', 'NN'),
 ('deal', 'NN'),
 ('on', 'IN'),
 ('saturday', 'NN')]

b = iter(a[1:])

my_list = []
inner_list = []
accepted = ['JJ', 'NN']

for item in a:
    word = item[0]
    check = item[1]
    try:
        against = next(b)
        if check in accepted:
            if against[1] not in accepted:
                inner_list.append(word)
                my_list.append(inner_list)
                inner_list = []
            else:
                inner_list.append(word)
    except StopIteration:
        if check in accepted:
             inner_list.append(word)
             my_list.append(inner_list)
final = [' '.join(item) for item in my_list]