鉴于我有使用换行符分隔的标记化句子,并且我有2列代表标记的实际和预测标记。我想循环遍历这些令牌中的每一个,并找出错误的预测,例如实际标签不等于预测标签
#word actual predicted
James PERSON PERSON
Washington PERSON LOCATION
went O O
home O LOCATION
He O O
took O O
Elsie PERSON PERSON
along O O
>James Washington went home: Incorrect
>He took Elsie along: Correct
答案 0 :(得分:0)
Python字符串具有强大的解析功能,您可以在此处使用。我使用Python 3.3做到了这一点,但它也适用于任何其他版本。
thistext = '''James PERSON PERSON
Washington PERSON LOCATION
went O O
home O LOCATION
He O O
took O O
Elsie PERSON PERSON
along O O
'''
def check_text(text):
lines = text.split('\n')
correct = [True] #a bool wrapped in a list,we can modify it from a nested function
words = []
def print_result():
if words:
print( ' '.join(words), ": ", "Correct" if correct[0] else "Incorrect" )
#words.clear()
del words[:]
correct[0] = True
for line in lines:
if line.strip(): # check if the line is empty
word, a, b = line.split()
if a != b:
correct[0] = False
words.append(word)
else:
print_result();
print_result()
check_text(thistext)
答案 1 :(得分:0)
除了我previous answer我使用的all()
和列表理解:
from itertools import groupby
d = {True: 'Correct', False: 'Incorrect'}
with open('text1.txt') as f:
for k, g in groupby(f, key=str.isspace):
if not k:
# Split each line in the current group at whitespaces
data = [line.split() for line in g]
# If for each line the second column is equal to third then `all()` will
# return True.
predicts_matched = all(line[1] == line[2] for line in data)
print ('{}: {}'.format(' '.join(x[0] for x in data), d[predicts_matched]))
<强>输出:强>
James Washington went home: Incorrect
He took Elsie along: Correct