Question

有！

我尝试输出文本中每个单词的所有可能的词性（pos）。但是，我需要将输出打印为“列表列表”或“元组列表”，以供进一步使用。

任何人都可以帮助，非常感谢！

import nltk
from nltk.tokenize import word_tokenize

text = "I can answer those question ."     # original text
tokenized_text = word_tokenize(text)       # word tokenization
wsj = nltk.corpus.treebank.tagged_words()  
cfd1 = nltk.ConditionalFreqDist(wsj)       # find all possible pos of each word

i = 0
while i< len(tokenized_text):
    pos_only = list(cfd1[tokenized_text[i]])
    y = pos_only
    print(y)
    i+=1

我的输出是

['NNP', 'PRP']
['MD', 'NN']
['NN', 'VB']
['DT']
['NN', 'VBP', 'VB']
['.']

我的预期输出是

[['NNP', 'PRP'], ['MD', 'NN'], ['NN', 'VB'], ['DT'], ['NN', 'VBP', 'VB'], ['.']]

或

[('NNP', 'PRP'), ('MD', 'NN'), ('NN', 'VB'), ('DT'), ('NN', 'VBP', 'VB'), ('.')]

Answer 1

我认为您需要创建一个空列表并在迭代过程中附加元素。我假设print(y)输出['NNP', 'PRP']等。然后您应该将y转换为元组，并在迭代过程中将其附加到列表中。这段代码应该做到这一点。

alist = []
i = 0
while i < len(tokenized_text):
    pos_only = list(cfd1[tokenized_text[i]])
    y = pos_only
    alist.append(tuple(y))
    i += 1
print(alist)

将循环中的值存储在列表列表或元组列表中

1 个答案: