我有两个列表,如下所示:
pos_tag(word_tokenize('This shoe is of Blue color.'))
[('This', 'DT'),
('shoe', 'NN'),
('is', 'BEZ'),
('of', 'IN'),
('Blue', 'JJ-TL'),
('color', 'NN'),
('.', '.')]
custom_tags('This shoe is of Blue color.')
Out[125]:
[('This', 'Other'),
('shoe', 'Product'),
('is', 'Other'),
('of', 'Other'),
('Blue', 'Color'),
('color', 'Other'),
('.', 'Other')]
由两个函数返回。 现在我想将它们合并为一个,最后以CONLL的格式写入文本文件,如下所示:
LEICESTERSHIRE NNP I-NP I-ORG
TAKE NNP I-NP O
OVER IN I-PP O
AT NNP I-NP O
TOP NNP I-NP O
AFTER NNP I-NP O
INNINGS NNP I-NP O
VICTORY NN I-NP O
仅在我的情况下输出将是:
This DT Other
shoe NN Product
is BEZ Other
of IN Other
Blue JJ-TL Color
Color NN Other
我试过这样做:
list(zip(pos_tag(word_tokenize(sentence)),custom_tags(sentence)))
但是这给了我:
[(('This', 'DT'), ('This', 'Other')),
(('footwear', 'NN'), ('footwear', 'Product')),
(('is', 'BEZ'), ('is', 'Other')),
(('of', 'IN'), ('of', 'Other')),
(('blue', 'JJ'), ('blue', 'Color')),
(('color', 'NN-HL'), ('color', 'Other'))]
有人可以帮助我获得所需的输出,我还需要将每个输出写入文本文件中,并在行之间插入一些分隔符。
答案 0 :(得分:0)
理解
l1=[('This', 'DT'), ('shoe', 'NN'), ('is', 'BEZ'), ('of', 'IN'), ('Blue', 'JJ-TL'), ('color', 'NN'), ('.', '.')]
l2=[('This', 'Other'), ('shoe', 'Product'), ('is', 'Other'), ('of', 'Other'), ('Blue', 'Color'), ('color', 'Other'), ('.', 'Other')]
l3=[(x[0][0],x[0][1],x[1][1]) for x in zip(l1, l2)]
答案 1 :(得分:0)
为什么不尝试使用追加,即使它不是最优雅的方式?
A =
[('This', 'DT'),
('shoe', 'NN'),
('is', 'BEZ'),
('of', 'IN'),
('Blue', 'JJ-TL'),
('color', 'NN'),
('.', '.')]
B =
[('This', 'Other'),
('shoe', 'Product'),
('is', 'Other'),
('of', 'Other'),
('Blue', 'Color'),
('color', 'Other'),
('.', 'Other')]
Title =
[('This', ),
('shoe', ),
('is', ),
('of', ),
('Blue', ),
('color', ),
('.', )]
for j, item in enumerate(A):
Title[j].append(item)
Title[j].append(B[j][1])
for tuple in Title:
line = '{0[0]} {0[1]} {0[2]}'.format(tuple)
对于写文件,请使用open() 例如,
f = open('This/is/your/destination/file.txt', 'w')
# Here you do something
f.write( )
f.close()