Question

仍然是Python 3的新手并且遇到了一个问题......我正在尝试创建一个函数，该函数返回一个字典，其中键是每个单词的长度，值是字符串中的单词。

例如，如果我的字符串是：“狗快速前往公园”，我的字典应该返回 {2: ['to'] 3: ['The', 'run', 'the'], 4: ['dogs', 'park], 7: ['quickly', 'forward']}

问题是，当我遍历项目时，它只是附加字符串中的一个单词。

def word_len_dict(my_string):
    dictionary = {}
    input_list = my_string.split(" ")
    unique_list = []
    for item in input_list:
        if item.lower() not in unique_list:
            unique_list.append(item.lower())
    for word in unique_list:
        dictionary[len(word)] = []
        dictionary[len(word)].append(word)
    return (dictionary)

print (word_len_dict("The dogs run quickly forward to the park"))

代码返回

{2: ['to'], 3: ['run'], 4: ['park'], 7: ['forward']}

有人能指出我正确的方向吗？也许不能自由地给我答案，但是我需要在列表中添加缺失的单词时再看一下。我认为将它们附加到列表中就可以了，但事实并非如此。

谢谢！

Answer 1

这将解决您的所有问题：

def word_len_dict(my_string):
    input_list = my_string.split(" ")

    unique_set = set()
    dictionary = {}

    for item in input_list:
        word = item.lower()
        if word not in unique_set:
            unique_set.add(word)
            key = len(word)
            if key not in dictionary:
                dictionary[key] = []
            dictionary[key].append(word)

    return dictionary

每次遇到新单词时，您都在擦除dict条目。还存在一些有效的问题（在成长时搜索成员列表导致O（n）任务的O（n ** 2）算法）。使用集合成员资格测试替换列表成员资格测试可以纠正效率问题。

它为您的样本句子提供了正确的输出：

>>> print(word_len_dict("The dogs run quickly forward to the park"))
{2: ['to'], 3: ['the', 'run'], 4: ['dogs', 'park'], 7: ['quickly', 'forward']}

我注意到其他一些已发布的解决方案未能将单词映射为小写和/或无法删除重复项，这是您显然想要的。

Answer 2

您可以首先创建这样的唯一单词列表，以避免第一个循环，并在第二步填充字典。

unique_string = set("The dogs run quickly forward to the park".lower().split(" "))
dict = {}


for word in unique_string:
    key, value = len(word), word
    if key not in dict:         # or dict.keys() for better readability (but is the same)
        dict[key] = [value]
    else:
        dict[key].append(value)

print(dict)

Answer 3

在附加最新单词之前，您正在为字典项目指定一个空列表，这会清除所有之前的单词。

Answer 4

您的代码只是每次都将密钥重置为空列表，这就是为什么您只能在列表中为每个密钥获取一个值（最后一个值）。

为了确保没有重复项，您可以将键的默认值设置为 set ，这是一个强制唯一性的集合（换句话说，集合中不能有重复项））。

def word_len_dict(my_string):
    dictionary = {}
    input_list = my_string.split(" ")
    for word in input_list:
        if len(word) not in dictionary:
            dictionary[len(word)] = set()
        dictionary[len(word)].add(word.lower())
    return dictionary

添加该检查后，您也可以摆脱第一个循环。现在它将按预期工作。

您还可以使用setdefault词典方法进一步优化代码。

for word in input_list:
   dictionary.setdefault(len(word), set()).add(word.lower())

Answer 5

Pythonic方式，

使用itertools.groupby

>>> my_str = "The dogs run quickly forward to the park"
>>> {x:list(y) for x,y in itertools.groupby(sorted(my_str.split(),key=len), key=lambda x:len(x))}
{2: ['to'], 3: ['The', 'run', 'the'], 4: ['dogs', 'park'], 7: ['quickly', 'forward']}

Answer 6

for word in unique_list:
dictionary[len(word)] = [x for x in input_list if len(x) == len(word)]

Answer 7

此选项首先创建一组唯一的小写单词，然后利用dict的setdefault来避免多次搜索字典键。

>>> a = "The dogs run quickly forward to the park"
>>> b = set((word.lower() for word in a.split()))
>>> result = {}
>>> {result.setdefault(len(word), []).append(word.lower()) for word in b}
{None}
>>> result
{2: ['to'], 3: ['the', 'run'], 4: ['park', 'dogs'], 7: ['quickly', 'forward']}

循环字符串时向字典添加多个值

7 个答案: