我在获取元组中的项目时遇到了一些困难。我有一个元组列表,它看起来像这样(包含一个单词和一个标签):
[('An', 'DET'),
('autumn', 'NOUN'),
('evening', 'NOUN'),
('.', '.'),
('In', 'ADP'),
('an', 'DET'),
('old', 'ADJ'),
('woodshed', 'NOUN'),
('The', 'DET'),
('long', 'ADJ'),
('points', 'NOUN'),
('of', 'ADP'),
('icicles', 'NOUN'),
('Are', 'NOUN'),
('sharpening', 'VERB'),
('the', 'DET'),
('wind', 'NOUN'),
('.', '.')....]
我想要做的是迭代这些元组并确定下一个单词标记基于前一个单词标记的可能性。例如,如果我想确定'DET'出现在'NOUN'前面多少次,我会想要迭代元组并确定,例如:
'DET'出现在'NOUN'前面的次数
到目前为止,我已经尝试过这个:
prob = 0.0
for item in tuples:
if item[1] == "DET" and item + 1[1] == "NOUN"
return prob
if
声明显然不正确。有谁知道我可以做什么来访问下一个项目?
答案 0 :(得分:1)
将这两个词组合在一起的最简单方法是使用zip(seq, seq[1:])
中显示的>>> from collections import Counter
>>> Counter((f, s) for (_, f), (_, s) in zip(tuples, tuples[1:]))
Counter({('ADJ', 'NOUN'): 2, ('NOUN', 'ADP'): 2, ('NOUN', 'NOUN'): 2,
('DET', 'NOUN'): 2, ('DET', 'ADJ'): 2, ('ADP', 'NOUN'): 1,
('NOUN', 'VERB'): 1, ('NOUN', 'DET'): 1, ('VERB', 'DET'): 1,
('ADP', 'DET'): 1})
。
收集计数的最简单方法是使用recipes section for the itertools module。
将它们放在一起看起来像这样:
from __future__ import print_function
import sys
import nester
man = []
other = []
try:
data = open('sketch.txt')
for each_line in data:
try:
(role, line_spoken) = each_line.split(':' , 1)
line_spoken = line_spoken.strip()
if role == 'Man':
man.append(line_spoken)
elif role == 'Other Man':
other.append(line_spoken)
except ValueError:
pass
data.close()
except IOError:
print('the data file is missing')
try:
with open('man_data.txt', 'w') as man_file:
print_lol(man, file = man_file)
with open('other_data.txt', 'w') as other_file:
print_lol(other, file = other_file)
man_file.close()
other_file.close()
except IOError as err:
print('File error: ' + str(err))
答案 1 :(得分:0)
使用enumerate()获取您正在循环的项目的索引:
count = 0
for index, item in enumerate(tuples[:-1]):
if item[1] == 'DET' and tuples[index+1][1] == 'NOUN':
count += 1
print count