Question

我想循环访问数据库，找到适当的值并将它们插入单独文件中的相应单元格中。它可能是csv，或任何其他人类可读的格式。在伪代码中：

for item in huge_db:
   for list_of_objects_to_match:
      if itemmatch():
         if there_arent_three_matches_yet_in_list():
            matches++
            result=performoperationonitem()
            write_in_file(result, row=object_to_match_id, col=matches)
         if matches is 3:
            remove_this_object_from_object_to_match_list()

除了每次都能逐行浏览所有的outputfile之外，你能想到什么？我甚至不知道该搜索什么... 更好的是，有更好的方法可以在数据库中找到三个匹配的对象并实时获得结果吗？（操作需要一段时间，但我希望看到结果弹出RT）

Answer 1

假设itemmatch()是一个相当简单的函数，这将做我想你想要的比你的伪代码更好的东西：

for match_obj in list_of_objects_to_match:
  db_objects = query_db_for_matches(match_obj)
  if len(db_objects) >= 3:
      result=performoperationonitem()
      write_in_file(result, row=match_obj.id, col=matches)
  else:
      write_blank_line(row=match_obj.id)  # if you want

然后技巧就是编写query_db_for_matches()函数。如果没有细节，我会假设你正在寻找在一个特定领域匹配的对象，称之为type。在pymongo这样的查询看起来像：

def query_db_for_matches(match_obj):
    return pymongo_collection.find({"type":match_obj.type})

要使其有效运行，请首先调用以确保您的数据库在您查询的字段上有索引：

pymongo_collection.ensure_index({"type":1})

第一次拨打ensure_index时，可能需要很长时间才能收集大量信息。但每次之后它会快速 - 足够快，你甚至可以在query_db_for_matches之前将它放入find并且它会没事。

写表格单元实时python

1 个答案: