我有这样的情况,我需要遍历两个对象列表并找到相等,然后循环其字段并更改一些属性。看起来像这样
for new_product in products_and_articles['products']:
for old_product in products_for_update:
if new_product.article == old_product.article:
for old_field in old_product._meta.get_all_field_names():
for new_field in new_product._meta.get_all_field_names():
if old_field == new_field and old_field != 'id' and old_field != 'slug':
setattr(old_product, old_field, getattr(new_product, old_field))
显然,这远不是好的,甚至不可接受的。 所以我正在寻求建议如何避免这么多循环并增强algorythm
答案 0 :(得分:5)
如果您将流程分解为逻辑可重用的部分,这会有所帮助。
for new_product in products_and_articles['products']:
for old_product in products_for_update:
if new_product.article == old_product.article:
…
例如,您在这里所做的是找到与特定article
匹配的产品。由于article
是唯一的,我们可以这样写:
def find_products_by_article(products, article):
'''Find all products that match the given article. Returns
either a product or 'None' if it doesn't exist.'''
for products in products:
return product
然后用:
调用它for old_product in products_for_update:
new_products = find_products_by_article(
products_and_articles['products'],
old_product.article)
…
但是,如果我们可以利用针对查找进行优化的数据结构,即dict
(常量),这可能会更加高效 而不是线性复杂性)。所以我们可以做的是:
# build a dictionary that stores products indexed by article
products_by_article = dict(product.article, product for product in
products_and_articles['products'])
for old_product in products_for_update:
try:
# look up product in the dictionary
new_product = products_by_article[old_product.article]
except KeyError:
# silently ignore products that don't exist
continue
…
如果经常进行此类查找,最好在其他地方重复使用products_by_article
字典,而不是每次都从头开始构建。 小心但是:如果你使用产品记录的多个表示,你需要让它们始终保持同步!
对于内部循环,请注意new_field
此处仅用于检查字段是否存在:
…
for old_field in old_product._meta.get_all_field_names():
for new_field in new_product._meta.get_all_field_names():
if old_field == new_field and old_field != 'id' and old_field != 'slug':
setattr(old_product, old_field, getattr(new_product, old_field))
(请注意,这有点可疑:old_product
中尚未存在的任何新字段都会被静默丢弃:这是故意的吗?)
这可以重新包装如下:
def transfer_fields(old, new, exclusions=('id', 'slug')):
'''Update all pre-existing fields in the old record to have
the same values as the new record. The 'exclusions' parameter
can be used to exclude certain fields from being updated.'''
# use a set here for efficiency reasons
fields = frozenset(old._meta.get_all_field_names())
fields.difference_update(new._meta.get_all_field_names())
fields.difference_update(exclusions)
for field in fields:
setattr(old, field, getattr(new, field))
把所有这些放在一起:
# dictionary of products indexed by article
products_by_article = dict(product.article, product for product in
products_and_articles['products'])
for old_product in products_for_update:
try:
new_product = products_by_article[old_product.article]
except KeyError:
continue # ignore non-existent products
transfer_fields(old_product, new_product)
此最终代码的时间复杂度为O(n × k)
,其中n
为产品数量,k
为字段数。
答案 1 :(得分:2)
您可以使用set
来查找交集,而不是在两个列表上循环并检查是否相等:
set(products_and_articles['products']).intersection(set(products_for_update))
示例:
>>> l=[1,2,3]
>>> a=[2,3,4]
>>> set(l).intersection(set(a))
set([2, 3])
答案 2 :(得分:1)
我们从四个循环开始,效率为O(n^2*k^2)
,n是项目数,k是属性数。让我们看看我们能做些什么。
首先,摆脱new_product
循环,你不需要它:
for old_field in old_product._meta.get_all_field_names():
for new_field in new_product._meta.get_all_field_names():
if old_field == new_field and old_field != 'id' and old_field != 'slug':
setattr(old_product, old_field, getattr(new_product, old_field))
要:
for old_field in old_product._meta.get_all_field_names():
if old_field != 'id' and old_field != 'slug':
setattr(old_product, old_field, getattr(new_product, old_field))
得到O(n ^ 2 * k)。现在是产品发现部分。
首先,将两个列表排序,然后像在合并排序中合并列表时那样继续:
a = sorted(products_and_articles['products'], key=lambda x: x.article)
b = sorted(products_for_update, key=lambda x: x.article)
i = j = 0
while(i < len(a) and j < len(b)):
if (a[i].article < b[j].article):
a += 1
continue
if (a[i].article > b[j].article):
b += 1
continue
...logic...
a += 1 # Maybe you want to get rid of this one, I'm not sure..
b += 1
根据数据库的大小,它可能或多或少足够,因为它要求您创建新的排序列表。内存不是很重(无论如何都只有裁判),但如果你有很长的名单和有限的空间,那么巨大的效率胜利可能无法弥补。
归结为O(n*logn*k)
,这是我能做的最好的事情。您可以使用词典来降低它,但它需要您更改数据库,因此需要更多的时间和精力。
答案 3 :(得分:0)
前两个可以改为:
from itertools import product
for new_product, old_product in product(list1, list2)
# logic and other loops
你可以为两个内部循环做同样的事情:
for old_field in old_product._meta.get_all_field_names(): for new_field in new_product._meta.get_all_field_names():
for old_field, new_field in product(list1, list2)