Question

这是我的代码：

def corpus_reading_pos(corpus_name, pos_tag, option="pos"):
    pos_tags = []
    words = []
    tokens_pos = {}
    file_count = 0
    for root, dirs, files in os.walk(corpus_name):
        for file in files:
            if file.endswith(".v4_gold_conll"):
                with open((os.path.join(root, file))) as f:
                    pos_tags += [line.split()[4] for line in f if line.strip() and not line.startswith("#")]
                with open((os.path.join(root, file))) as g:
                    words += [line.split()[3] for line in g if line.strip() and not line.startswith("#")]
                    file_count += 1
    for pos in pos_tags:
        tokens_pos[pos] = []
    words_pos = list(zip(words, pos_tags))
    for word in words_pos:
        tokens_pos[word[1]] = word[0]
    #print(words_pos)
    print(tokens_pos)
    #print(words)
    print("Token count:", len(tokens_pos))
    print("File count:", file_count)

我试图创建一个包含所有pos项作为键的字典，字典值将是属于该特定pos的所有单词。我一直坚持在字典中的值，我必须创建一个单词列表，但我似乎无法到达那里。

在代码中，行tokens_pos [word [1]] = word [0]只为每个键添加一个单词，但如果我尝试类似[] .append（word [0]），字典将返回所有值没有。

Answer 1

您似乎正在做很多双重工作，但要为您的具体问题提供解决方案：

for word in words_pos:
    tokens_pos[word[1]].append(word[0])

应该做你想要达到的目标。

带

tokens_pos[word[1]] = word[0]

你基本上覆盖了具有相同密钥的现有值，因此只有最后一个带有该密钥的写入值才会保留在最后。

将其他值作为列表添加到字典值

1 个答案: