Question

我有两个源自词性标注器的列表，如下所示：

pos_tags = [('This', u'DT'), ('is', u'VBZ'), ('a', u'DT'), ('test', u'NN'), ('sentence', u'NN'), ('.', u'.'), ('My', u"''"), ('name', u'NN'), ('is', u'VBZ'), ('John', u'NNP'), ('Murphy', u'NNP'), ('and', u'CC'), ('I', u'PRP'), ('live', u'VBP'), ('happily', u'RB'), ('on', u'IN'), ('Planet', u'JJ'), ('Earth', u'JJ'), ('!', u'.')]


pos_names = [('John', 'NNP'), ('Murphy', 'NNP')]

我想创建一个最终列表，用pos_names中的列表项更新pos_tags。所以基本上我需要在pos_tags中找到John和Murphy并用NNP替换POS标签。

Answer 1

您可以从pos_names创建一个充当查找表的字典。然后，您可以使用get在表格中搜索可能的替换项，如果没有找到替换项，则保留标记。

d = dict(pos_names)
pos_tags = [(word, d.get(word, tag)) for word, tag in pos_tags]

Answer 2

鉴于

pos_tags = [('This', u'DT'), ('is', u'VBZ'), ('a', u'DT'), ('test', u'NN'), ('sentence', u'NN'), ('.', u'.'), ('My', u"''"), ('name', u'NN'), ('is', u'VBZ'), ('John', u'NNP'), ('Murphy', u'NNP'), ('and', u'CC'), ('I', u'PRP'), ('live', u'VBP'), ('happily', u'RB'), ('on', u'IN'), ('Planet', u'JJ'), ('Earth', u'JJ'), ('!', u'.')]

和

names = ['John', 'Murphy']

你可以这样做：

[next((subl for subl in pos_tags if name in subl)) for name in names]

会给你：

[('John', u'NNP'), ('Murphy', u'NNP')]

Answer 3

我同意字典对于这个问题是一个更自然的解决方案，但如果您需要pos_tags以便更明确的解决方案：

for word, pos in pos_names:
    for i, (tagged_word, tagged_pos) in enumerate(pos_tags):
        if word == tagged_word:
            pos_tags[i] = (word,pos)

（对于大量单词，字典会更快，因此您可能需要考虑将单词顺序存储在列表中并使用字典进行POS分配。）

比较列表的子项并在Python中进行更改

3 个答案: