Python:如何按特征或属性对对象列表进行分组?

时间:2016-08-31 15:18:29

标签: python arrays algorithm sorting

我想将对象列表分成子列表,其中具有相同属性/特征的对象保留在同一子列表中。

假设我们有一个字符串列表:

["This", "is", "a", "sentence", "of", "seven", "words"]

我们希望根据字符串的长度分隔字符串,如下所示:

[['sentence'], ['a'], ['is', 'of'], ['This'], ['seven', 'words']]

我目前提出的计划是

sentence = ["This", "is", "a", "sentence", "of", "seven", "words"]
word_len_dict = {}
for word in sentence:
    if len(word) not in word_len_dict.keys():
        word_len_dict[len(word)] = [word]
    else:
        word_len_dict[len(word)].append(word)


print word_len_dict.values()

我想知道是否有更好的方法来实现这一目标?

7 个答案:

答案 0 :(得分:5)

看看itertools.groupby()。请注意,您的列表必须先排序(比您的方法OP 更昂贵)。

>>> from itertools import groupby
>>> l = ["This", "is", "a", "sentence", "of", "seven", "words"]
>>> print [list(g[1]) for g in groupby(sorted(l, key=len), len)]
[['a'], ['is', 'of'], ['This'], ['seven', 'words'], ['sentence']]

或者如果你想要一个字典 - >

>>> {k:list(g) for k, g in groupby(sorted(l, key=len), len)}
{8: ['sentence'], 1: ['a'], 2: ['is', 'of'], 4: ['This'], 5: ['seven', 'words']}

答案 1 :(得分:2)

使用defaultdict(list),您可以省略密钥存在检查:

from collections import defaultdict

word_len_dict = defaultdict(list)

for word in sentence:
    word_len_dict[len(word)].append(word)

答案 2 :(得分:1)

itertools.groupby的文档中有一个与您想要的完全匹配的示例。

keyfunc = lambda x: len(x)
data = ["This", "is", "a", "sentence", "of", "seven", "words"]
data = sorted(data, key=keyfunc)
groups = []
for k, g in groupby(data, keyfunc):
    groups.append(list(g))
print groups

答案 3 :(得分:1)

只能使用setdefault函数

来执行此操作
sentence = ["This", "is", "a", "sentence", "of", "seven", "words"]
word_len_dict = {}
for word in sentence:
    word_len_dict.setdefault(len(word), []).append(word)

setdefault所做的是在字典中设置密钥len(word),如果它不存在,只需检索该值即可。 setdefault中的第二个参数是您希望它与该密钥一起存储的默认值。

重要的是要注意,如果密钥已存在,则setdefault 中传递的默认值不会替换旧值。这样可以确保每个列表只创建一次,之后setdefault将只检索相同的列表。

答案 4 :(得分:0)

现在我不是说除非你更好地考虑紧凑代码,否则这样做会更好。你的版本(非常好的imo)更具可读性和可维护性。

list_ = ["This", "is", "a", "sentence", "of", "seven", "words"]

# for python 2 filter returns() a list
result = filter(None,[[x for x in list_ if len(x) == i] for i in range(len(max(list_, key=lambda y: len(y)))+1)])

# for python 3 filter() returns an iterator
result = list(filter(None,[[x for x in list_ if len(x) == i] for i in range(len(max(list_, key=lambda y: len(y)))+1)]))

答案 5 :(得分:0)

sentence = ["This", "is", "a", "sentence", "of", "seven", "words"]
getLength = sorted(list(set([len(data) for data in sentence])))

result = []

for length in getLength:
    result.append([data for data in sentence if length == len(data)])

print(result)

答案 6 :(得分:0)

如果你的目标是用更少的线来做,总会有理解:

data = ["This", "is", "a", "sentence", "of", "seven", "words"]
# Get all unique length values
unique_length_vals = set([len(word) for word in data])
# Get lists of same-length words
res = [filter(lambda x: len(x) == lval, data) for lval in unique_length_vals]

它可能不太清楚,但如果您只想快速编写代码,则非常有用。