Question

我有这样的情况，我需要遍历两个对象列表并找到相等，然后循环其字段并更改一些属性。看起来像这样

for new_product in products_and_articles['products']:
  for old_product in products_for_update:
    if new_product.article == old_product.article:
      for old_field in old_product._meta.get_all_field_names():
        for new_field in new_product._meta.get_all_field_names():
          if old_field == new_field and old_field != 'id' and old_field != 'slug':
            setattr(old_product, old_field, getattr(new_product, old_field))

显然，这远不是好的，甚至不可接受的。所以我正在寻求建议如何避免这么多循环并增强algorythm

Answer 1

如果您将流程分解为逻辑可重用的部分，这会有所帮助。

for new_product in products_and_articles['products']:
  for old_product in products_for_update:
    if new_product.article == old_product.article:
      …

例如，您在这里所做的是找到与特定article匹配的产品。由于article是唯一的，我们可以这样写：

def find_products_by_article(products, article):
  '''Find all products that match the given article.  Returns
  either a product or 'None' if it doesn't exist.'''
  for products in products:
    return product

然后用：

调用它

for old_product in products_for_update:
  new_products = find_products_by_article(
                   products_and_articles['products'],
                   old_product.article)
  …

但是，如果我们可以利用针对查找进行优化的数据结构，即dict（常量），这可能会更加高效而不是线性复杂性）。所以我们可以做的是：

# build a dictionary that stores products indexed by article
products_by_article = dict(product.article, product for product in
                           products_and_articles['products'])

for old_product in products_for_update:
  try:
    # look up product in the dictionary
    new_product = products_by_article[old_product.article]
  except KeyError:
    # silently ignore products that don't exist
    continue
  …

如果经常进行此类查找，最好在其他地方重复使用products_by_article字典，而不是每次都从头开始构建。小心但是：如果你使用产品记录的多个表示，你需要让它们始终保持同步！

对于内部循环，请注意new_field此处仅用于检查字段是否存在：

…
  for old_field in old_product._meta.get_all_field_names():
    for new_field in new_product._meta.get_all_field_names():
      if old_field == new_field and old_field != 'id' and old_field != 'slug':
        setattr(old_product, old_field, getattr(new_product, old_field))

（请注意，这有点可疑：old_product中尚未存在的任何新字段都会被静默丢弃：这是故意的吗？）

这可以重新包装如下：

def transfer_fields(old, new, exclusions=('id', 'slug')):
  '''Update all pre-existing fields in the old record to have
  the same values as the new record.  The 'exclusions' parameter
  can be used to exclude certain fields from being updated.'''
  # use a set here for efficiency reasons
  fields = frozenset(old._meta.get_all_field_names())
  fields.difference_update(new._meta.get_all_field_names())
  fields.difference_update(exclusions)
  for field in fields:
    setattr(old, field, getattr(new, field))

把所有这些放在一起：

# dictionary of products indexed by article
products_by_article = dict(product.article, product for product in
                           products_and_articles['products'])

for old_product in products_for_update:
  try:
    new_product = products_by_article[old_product.article]
  except KeyError:
    continue          # ignore non-existent products
  transfer_fields(old_product, new_product)

此最终代码的时间复杂度为O(n × k)，其中n为产品数量，k为字段数。

Answer 2

您可以使用set来查找交集，而不是在两个列表上循环并检查是否相等：

set(products_and_articles['products']).intersection(set(products_for_update))

示例：

>>> l=[1,2,3]
>>> a=[2,3,4]
>>> set(l).intersection(set(a))
set([2, 3])

Answer 3

我们从四个循环开始，效率为O(n^2*k^2)，n是项目数，k是属性数。让我们看看我们能做些什么。

首先，摆脱new_product循环，你不需要它：

for old_field in old_product._meta.get_all_field_names():
    for new_field in new_product._meta.get_all_field_names():
        if old_field == new_field and old_field != 'id' and old_field != 'slug':
            setattr(old_product, old_field, getattr(new_product, old_field))

要：

for old_field in old_product._meta.get_all_field_names():
    if old_field != 'id' and old_field != 'slug':
        setattr(old_product, old_field, getattr(new_product, old_field))

得到O（n ^ 2 * k）。现在是产品发现部分。

首先，将两个列表排序，然后像在合并排序中合并列表时那样继续：

a = sorted(products_and_articles['products'], key=lambda x: x.article)
b = sorted(products_for_update, key=lambda x: x.article)
i = j = 0
while(i < len(a) and j < len(b)):
    if (a[i].article < b[j].article):
        a += 1
        continue
    if (a[i].article > b[j].article):
        b += 1
        continue
    ...logic...
    a += 1  # Maybe you want to get rid of this one, I'm not sure..
    b += 1

根据数据库的大小，它可能或多或少足够，因为它要求您创建新的排序列表。内存不是很重（无论如何都只有裁判），但如果你有很长的名单和有限的空间，那么巨大的效率胜利可能无法弥补。

归结为O(n*logn*k)，这是我能做的最好的事情。您可以使用词典来降低它，但它需要您更改数据库，因此需要更多的时间和精力。

Answer 4

前两个可以改为：

from itertools import product


for new_product, old_product in product(list1, list2)
    # logic and other loops

你可以为两个内部循环做同样的事情：

 for old_field in old_product._meta.get_all_field_names():
    for new_field in new_product._meta.get_all_field_names():

for old_field, new_field in product(list1, list2)

Python中更有效的循环

4 个答案: