Question

我想编写一个生成器函数，该函数将在内存受限系统上运行，该系统使用PyMySql（或MySQLDb）一次返回一个select查询的结果。以下作品：

#execute a select query and return results as a generator
def SQLSelectGenerator(self,stmt):
    #error handling code removed
    cur.execute(stmt)

    row = ""
    while row is not None:
        row = self.cur.fetchone()
        yield row

然而，以下似乎也有效，但它是否正在执行fetchall（）是神秘的。我无法在Python DB API中找到将游标对象作为列表进行迭代时会发生什么：

#execute a select query and return results as a generator
def SQLSelectGenerator(self,stmt):
    #error handling code removed
    cur.execute(stmt)

 for row in self.cur:
    yield row

在这两种情况下，以下内容都会成功打印所有行

stmt = "select * from ..."
for l in SQLSelectGenerator(stmt):
    print(l)

所以我想知道第二个实现是更好还是更差，以及是否使用fetchone调用fetchall或做一些棘手的事情。 Fetchall将炸毁将继续运行的系统，因为有数百万行。

Answer 1

根据PyMySql source，做

for row in self.cur:
   yield row

意味着你在内部重复执行fetchone()，就像你的第一个例子一样：

class Cursor(object):
    '''
    This is the object you use to interact with the database.
    '''
    ...
    def __iter__(self):
        return iter(self.fetchone, None)

所以我希望这两种方法在内存使用和性能方面基本相同。你可以使用第二个，因为它更干净，更简单。

在内存高效的生成器中使用PyMySql的正确方法

1 个答案: