我有一个句子列表列表,用单词标记,然后用pos标记,所以结果显然是一个包含元素的列表:
[(w1,pos_tag1)(w2,pos_tag2)]
[(w3,pos_tag3),(w4,pos_tag4),(w5,pos_tag5)]
[(w6,pos_tag6),(w7,pos_tag7)]
我需要得到一个pos_tags列表,其顺序与所有句子中的顺序相同。我尝试的是列表上的迭代
tags = [x [1] for x in in element in list]
但这不起作用。如何在这些列表中包含所有标签?
感谢
答案 0 :(得分:4)
您可以使用zip(*list)
惯用法解包元组列表,请参阅Unpacking a list / tuple of pairs into two lists / tuples
>>> from nltk import pos_tag
>>> tagged_sent = pos_tag('The quick brown fox jumps over the lazy dog'.split())
>>> tagged_sent
[('The', 'DT'), ('quick', 'JJ'), ('brown', 'NN'), ('fox', 'NN'), ('jumps', 'VBZ'), ('over', 'IN'), ('the', 'DT'), ('lazy', 'JJ'), ('dog', 'NN')]
>>> words, tags = zip(*tagged_sent)
>>> tags
('DT', 'JJ', 'NN', 'NN', 'VBZ', 'IN', 'DT', 'JJ', 'NN')
答案 1 :(得分:0)
假设您有一个单词 - 标签对列表列表:
tagged_sentences = [[(w1, t1), (w2, t2), ...], [(w5, t5), ...],...]
您可以执行以下操作来获取标记列表的列表:
> tags = [[tag for word, tag in sent] for sent in tagged_sentences]
# tags = [[x[1] for x in sent] for sent in tagged_sentences]
[[t1, t2, ...], [t5, ...], ...]
如果你想平整列表,即获得所有句子中所有标签的平面列表:
> tags = [tag for sent in tagged_sentences for word, tag in sent]
# tags = [x[1] for sent in tagged_sentences for x in sent]
[t1, t2, ..., t5, ...]