在我维护的Django项目中,用户上传照片以供其他人查看和评论。我不希望任何人上传最近看过的照片,因此我将每张上传的照片与最近的300张照片进行比较。如果我发现重复,我会提醒用户尝试新的东西。我想优化这个过程 - 这就是这个问题的关键。
目前,每当用户尝试上传照片时,都会从负责照片上传的基于类的视图的form_valid
方法内处理此代码:
recent_photos = Photo.objects.order_by('-id')[:300]
recent_hashes = [photo.avg_hash for photo in recent_photos]
#some code to compare avg_hash values across images to flag duplication
每次用户上传照片时,我都不想从客户端进行此DB调用,而是希望维护缓存中每个图像的最新300 avg哈希值的列表。这样,我就消除了DB调用。
现在,photo
对象创建时,每个图像的average_hash值都保存在数据库中。我想编写一个异步任务,它基本上接收每个图像的avg_hash值,并将其插入到缓存的avg_hash值列表中,使列表保持300长度和无限到期时间。
有人可以帮我编程这样的任务吗?我认为它应该是这样的:
@celery_app1.task(name='tasks.build_avg_hash_list')
def build_avg_has_list(latest_avg_hash):
cache_mem = get_cache('django.core.cache.backends.memcached.MemcachedCache', **{
'LOCATION': '127.0.0.1:11211', 'TIMEOUT': None,
})
try:
avg_hash_list = cache_mem.get('avg_hash_list')
avg_hash_list.insert(0,latest_avg_hash)
list_len = len(avg_hash_list)
if list_len > 300:
del avg_hash_list[301:list_len]
except:
avg_hash_list = [latest_avg_hash]
cache_mem.set('avg_hash_list', avg_hash_list)
我在这里遗漏了一些东西。例如:
1)我使用Django 1.5.1,而Timeout:NONE
在设置无限到期时间时不起作用。
2)我不确定我提出的逻辑是否可靠地将列表长度固定为300个条目或更低。我应该使用MAX_ENTRIES
缓存参数吗?
3)最后,我也使用cache_mem = get_cache('django.core.cache.backends.memcached.MemcachedCache', **{ 'LOCATION': '127.0.0.1:11211', 'TIMEOUT': 120 })
进行另一个异步任务。我想确认在同一位置保存此新任务的输出不会覆盖另一个异步任务的输出(它也保存在同一位置)。
提前致谢。
答案 0 :(得分:1)
Looking at this question and having seen some of your other questions, it's high time that you invest in redis. This seemingly complex task is really simple if you use redis sets.
Each time some one adds a new image you add a new entry to your redis set of avg_hashes. You don't even need to limit yourself to 300, you can have thousands of items in the set and since they are peristed on disk, a server restart does not result in data loss.
When you want to see if the user is uploading a duplicate, you call sismember
If you do want to limit the number of members in the set, you can use srem to clear it up when the length exceeds some previously chosen number.
bonus feature: You don't need the celery task if you adapt redis, but since you are already using celery you may already have redis installed since it's one of celery's supported brokers.