Question

假设我有大量数据，我正在通过块加载到数据帧中;？例如：我有一个超过40 Gb的表，选择3列可能是2 - 3 gb假设，记录是1000万（行数）

c = pd.read_sql("select a,b,c from table;", con=db, chunksize=10**2):
b = c['a']

因为它正在按块读取表块，这是否意味着它不会立即将整个3 gb加载到内存中并且一次只能运行10 ^ 2 mb然后自动转到下一个块？

如果没有，如何让它表现得像这样？

Answer 1

引用文档

chunksize : int, default None
    If specified, return an iterator where chunksize is the number of rows
    to include in each chunk.

首先，chunksize表示行数而不是mb中的大小。提供chunksize也会产生返回迭代器而不是数据帧的效果。所以你需要循环它。鉴于此，在python方面，你只需要10 ^ 2行的内存。