Question

def dic_index(n):
    i = 0
    words = {}
    while i<lines:
        for word in n[i]:
            print word
            if duplicate(word, words, i+1)==True:
                break
            elif word in words:
                words[word].append(i+1)
            else:
                words[word]=[i+1]
        i+=1
    return words

这是我在python中将列表转换为字典的功能。

出于某种原因，如果输入this作为参数：

[['brisk', 'blow', 'wind', 'blow'], ['north', 'north', 'youth'], ['wind', 'cold', 'cold'], ['wind', 'yesteryear'], []]

它返回一个如下所示的字典：

{'blow': [1], 'north': [2], 'brisk': [1], 'cold': [3], 'yesteryear': [4], 'wind': [1, 3, 4]}

出于某种原因，它似乎正在跳过参数中第二个列表中的“青年”条目，我不知道为什么会这样做。

出于某种原因，for循环似乎正在跳过这个词。

这是我对副本的定义：

def duplicate(word, dic, line):
    if word not in dic:
        return False
    values = dic[word]
    length = len(values)
    if values[length-1] == line:
        return True
    else:
        return False

我的程序的目标是从用户那里获取输入，清除所有停用词等。并打印出一个索引。列表中作为我的示例参数的每个列表代表一个单独的行。所以在这种情况下，我的dic_index（）函数中的“lines”变量将是4。

Answer 1

我认为您将break与pass混为一谈？

尝试：

    if duplicate(word, words, i+1)==True:
        pass

[edit - explain]如果你使用“break”，那么for循环将在第一个副本处中断，并且将忽略该列表中剩余的所有内容。所以在[“北方”，“北方”，“青年”]中，第一个“北方”是好的，第二个“北方”触发了休息，而循环甚至没有达到“青年”。另一方面，如果使用“pass”，则忽略第二个“north”，然后转到i + = 1行。

注意：我必须将所有x替换为i，将length替换为len(n)来修复您的计划。

Answer 2

您的代码使用break结束每行循环，跳过一行中重复单词后面的任何单词。您可能想要使用continue。但是，你的代码不必要地复杂化了。

使用enumerate()对行进行编号，使用collections.defaultdict表示轻松，使用set()跟踪已计算的字词：

from collections import defaultdict

def dic_index(lines):
    indices = defaultdict(list)

    for i, line in enumerate(lines, 1):
        seen = set()
        for word in line:
            if word in seen:
                continue
            seen.add(word)
            indices[word].append(i)

    return indices

演示：

>>> from collections import defaultdict
>>> sample = [['brisk', 'blow', 'wind', 'blow'], ['north', 'north', 'youth'], ['wind', 'cold', 'cold'], ['wind', 'yesteryear'], []]
>>> def dic_index(lines):
...     indices = defaultdict(list)
...     for i, line in enumerate(lines, 1):
...         seen = set()
...         for word in line:
...             if word in seen:
...                 continue
...             seen.add(word)
...             indices[word].append(i)
...     return indices
... 
>>> dic_index(sample)
defaultdict(<type 'list'>, {'blow': [1], 'north': [2], 'brisk': [1], 'youth': [2], 'cold': [3], 'yesteryear': [4], 'wind': [1, 3, 4]})

用于将列表转换为字典的Python程序无法正常工作

2 个答案: