给出一个单词元组列表及其句子中的词性:
[('We', 'PRP'),
('took', 'VBD'),
('advantage', 'NN'),
('of', 'IN'),
('the', 'DT'),
('half', 'JJ'),
('price', 'NN'),
('sushi', 'NN'),
('deal', 'NN'),
('on', 'IN'),
('saturday', 'NN')]
我想使用正则表达式提取具有某些PoS序列的术语。这类似于('JJ')*('NN')+
,因此我有一个[('advantage', 'half price sushi deal', 'saturday')]
列表。执行这样一项任务最合适的方式是什么,请记住,我会为数百个句子做这件事?
谢谢!
答案 0 :(得分:1)
I think this might be something that will do the trick:
a = [('We', 'PRP'),
('took', 'VBD'),
('advantage', 'NN'),
('of', 'IN'),
('the', 'DT'),
('half', 'JJ'),
('price', 'NN'),
('sushi', 'NN'),
('deal', 'NN'),
('on', 'IN'),
('saturday', 'NN')]
b = iter(a[1:])
my_list = []
inner_list = []
accepted = ['JJ', 'NN']
for item in a:
word = item[0]
check = item[1]
try:
against = next(b)
if check in accepted:
if against[1] not in accepted:
inner_list.append(word)
my_list.append(inner_list)
inner_list = []
else:
inner_list.append(word)
except StopIteration:
if check in accepted:
inner_list.append(word)
my_list.append(inner_list)
final = [' '.join(item) for item in my_list]