Question

我有一个标准的对象列表，其中每个对象都定义为

class MyRecord(object):

  def __init__(self, name, date, category, memo):
      self.name = name 
      self.date = date
      self.category = category 
      self.memo = memo.strip().split()

当我创建一个对象时，通常输入的备忘录是一个很长的句子，例如：“ Hello world，这是一个新的滑稽备忘录”，然后在init函数中将其变成列表['Hello', 'world', 'is', 'a', 'new', 'funny-memo']。

假设在列表中有10000条这样的记录（具有不同的备忘录），我希望通过以下方式（尽可能快）对它们进行分组：

'Hello' : [all the records, which memo contains word 'Hello']
'world' : [all the records, which memo contains word 'world']
'is' : [all the records, which memo contains word 'is']

我知道如何使用group-by将记录按名称，日期或类别进行分组（因为它是单个值），但是按上述方式进行分组存在问题。

Answer 1

如果您想对它们进行快速分组，则应该执行一次，并且永远不要重新计算。为此，您可以尝试在创建过程中用于缓存作为组对象的方法：

class MyRecord():

    __groups = dict()

    def __init__(self, name, date, category, memo):
        self.name = name
        self.date = date
        self.category = category
        self.memo = memo.strip().split()
        for word in self.memo:
            self.__groups.setdefault(word, set()).add(self)

    @classmethod
    def get_groups(cls):
        return cls.__groups


records = list()
for line in [
        'Hello world this is a new funny-memo',
        'Hello world this was a new funny-memo',
        'Hey world this is a new funny-memo']:
    records.append(MyRecord(1, 1, 1, line))


print({key: len(val) for key, val in MyRecord.get_groups().items()})

输出：

{'Hello': 2, 'world': 3, 'this': 3, 'is': 2, 'a': 3, 'new': 3, 'funny-memo': 3, 'was': 1, 'Hey': 1}

Python3：按描述中的单词对对象列表进行分组

1 个答案: