返回字符串中单词的字典长度

时间:2016-03-14 11:27:40

标签: python string dictionary

我需要构建一个函数,该函数将字符串作为输入并返回字典 键是数字,值是包含具有等于键的字母数的唯一单词的列表 例如,如果输入函数如下:

n_letter_dictionary("The way you see people is the way you treat them and the Way you treat them is what they become")

该函数应该返回:

{2: ['is'], 3: ['and', 'see', 'the', 'way', 'you'], 4: ['them', 'they', 'what'], 5: ['treat'], 6: ['become', 'people']}

我写的代码如下:

def n_letter_dictionary(my_string):
    my_string=my_string.lower().split()
    sample_dictionary={}
    for word in my_string:
        words=len(word)
        sample_dictionary[words]=word
    print(sample_dictionary)
    return sample_dictionary

该函数返回字典如下:

{2: 'is', 3: 'you', 4: 'they', 5: 'treat', 6: 'become'}

字典不包含具有相同字母数的所有单词,但只返回字符串中的最后一个字。

7 个答案:

答案 0 :(得分:7)

由于您只想在list中存储唯一值,因此使用set实际上更有意义。您的代码几乎是正确的,您只需要确保在set已经是字典中的密钥时创建words,但是您添加到set如果words已经是你词典中的一个关键词。以下显示:

def n_letter_dictionary(my_string):
    my_string=my_string.lower().split()
    sample_dictionary={}
    for word in my_string:
        words=len(word)
        if words in sample_dictionary:
            sample_dictionary[words].add(word)
        else:
            sample_dictionary[words] = {word}
    print(sample_dictionary)
    return sample_dictionary

n_letter_dictionary("The way you see people is the way you treat them and the Way you treat them is what they become")

<强>输出

{2: set(['is']), 3: set(['and', 'the', 'see', 'you', 'way']), 
 4: set(['them', 'what', 'they']), 5: set(['treat']), 6: set(['become', 'people'])}

答案 1 :(得分:3)

您的代码的问题在于您只需将最新的单词放入字典中。相反,您必须将该单词添加到具有相同长度的某些单词集合中。在您的示例中,这是list,但set似乎更合适,假设订单不重要。

def n_letter_dictionary(my_string):
    my_string=my_string.lower().split()
    sample_dictionary={}
    for word in my_string:
        if len(word) not in sample_dictionary:
            sample_dictionary[len(word)] = set()
        sample_dictionary[len(word)].add(word)
    return sample_dictionary

使用collections.defaultdict(set)

可以缩短时间
    my_string=my_string.lower().split()
    sample_dictionary=collections.defaultdict(set)
    for word in my_string:
        sample_dictionary[len(word)].add(word)
    return dict(sample_dictionary)

或使用itertools.groupby,但为此必须按长度排序,首先:

    words_sorted = sorted(my_string.lower().split(), key=len)
    return {k: set(g) for k, g in itertools.groupby(words_sorted, key=len)}

示例(三种实现中的每一种都有相同的结果):

>>> n_letter_dictionary("The way you see people is the way you treat them and the Way you treat them is what they become")
{2: {'is'}, 3: {'way', 'the', 'you', 'see', 'and'}, 4: {'what', 'them', 'they'}, 5: {'treat'}, 6: {'become', 'people'}}

答案 2 :(得分:2)

使用sample_dictionary[words]=word覆盖到目前为止放置的当前内容。你需要一个清单,你可以附加清单。

而不是你需要:

if words in sample_dictionary.keys():
    sample_dictionary[words].append(word)
else:
    sample_dictionary[words]=[word]

因此,如果此键有值,我会附加它,否则创建一个新列表。

答案 3 :(得分:2)

您可以使用defaultdict库中的collections。您可以使用它为字典的值部分创建默认类型,在本例中为列表,并根据单词的长度附加到它。

from collections import defaultdict

def n_letter_dictionary(my_string):
    my_dict = defaultdict(list)
    for word in my_string.split():
        my_dict[len(word)].append(word)

    return my_dict

你可以在没有默认情况的情况下做到这一点,但是会长一点。

def n_letter_dictionary(my_string):
    my_dict = {}
    for word in my_string.split():
        word_length = len(word)
        if word_length in my_dict:
            my_dict[word_length].append(word)
        else:
            my_dict[word_length] = [word]

    return my_dict

确保不使用set()在值列表中重复。但请注意,如果您的值列表很大,并且您的输入数据非常独特,那么您将遇到性能回调,因为检查列表中是否已存在该值只会在遇到时提前退出。

from collections import defaultdict

def n_letter_dictionary(my_string):
    my_dict = defaultdict(list)
    for word in my_string.split():
        if word not in my_dict[len(word)]:
            my_dict[len(word)].append(word)

    return my_dict

# without defaultdicts
def n_letter_dictionary(my_string):
    my_dict = {}                                  # Init an empty dict
    for word in my_string.split():                # Split the string and iterate over it
        word_length = len(word)                   # Get the length, also the key
        if word_length in my_dict:                # Check if the length is in the dict
            if word not in my_dict[word_length]:  # If the length exists as a key, but the word doesn't exist in the value list
                my_dict[word_length].append(word) # Add the word
        else:
            my_dict[word_length] = [word]         # The length/key doesn't exist, so you can safely add it without checking for its existence

因此,如果你有很高的重复频率和一个简短的单词列表要扫描,这种方法是可以接受的。例如,如果您有一个随机生成的单词列表,只有字母字符的排列,导致值列表膨胀,扫描它们将变得昂贵。

答案 4 :(得分:1)

itertools groupby是完美的工具。

from itertools import groupby
def n_letter_dictionary(string):
    result = {}
    for key, group in groupby(sorted(string.split(), key = lambda x: len(x)), lambda x: len(x)):
        result[key] = list(group)
    return result

print n_letter_dictionary(“你看待别人的方式是你对待他们的方式以及你对待他们的方式就是他们的成就”)

# {2: ['is', 'is'], 3: ['The', 'way', 'you', 'see', 'the', 'way', 'you', 'and', 'the', 'Way', 'you'], 4: ['them', 'them', 'what', 'they'], 5: ['treat', 'treat'], 6: ['people', 'become']}

答案 5 :(得分:1)

我提出的最短解决方案使用defaultdict

from collections import defaultdict

sentence = ("The way you see people is the way you treat them"
            " and the Way you treat them is what they become")

现在算法:

wordsOfLength = defaultdict(list)
for word in sentence.split():
    wordsOfLength[len(word)].append(word)

现在wordsOfLength将保留所需的词典。

答案 6 :(得分:0)

my_string="a aa bb ccc a bb".lower().split()
sample_dictionary={}
for word in my_string:
    words=len(word)
    if words not in sample_dictionary:
        sample_dictionary[words] = []
    sample_dictionary[words].append(word)
print(sample_dictionary)