Question

我已经编写了rake任务来执行postgreSQL查询。该任务返回Result类的对象。

这是我的任务：

task export_products: :environment do
  results = execute "SELECT smth IN somewhere"
    if results.present?
      results
    else
      nil
    end
end

def execute sql
  ActiveRecord::Base.connection.execute sql
end

我的进一步计划是分批分割输出并将这些批次逐个保存到.csv文件中。在这里我卡住了。我无法想象如何为PG :: Result调用ActiveRecord :: Batches模块的find_in_batches方法。

我该怎么办？

编辑：我有遗留数据库的遗留SQL查询

Answer 1

如果你看一下find_in_batches is implemented的方式，你会看到算法本质上是：

强制查询按主键排序。
在查询中添加LIMIT子句以匹配批量大小。
从（2）执行修改后的查询以获取批次。
做任何需要处理的事情。
如果批次小于批量大小，则无限查询已用尽，因此我们已完成。
从（3）中获得的批次中获取最大主查询值（last_max）。
从（2）＆＃39; primary_key_column > last_max子句中向WHERE添加def in_batches_of(batch_size) last_max = 0 # This should be safe for any normal integer primary key. query = %Q{ select whatever from table where what_you_have_now and primary_key_column > %{last_max} order by primary_key_column limit #{batch_size} } results = execute(query % { last_max: last_max }).to_a while(results.any?) yield results break if(results.length < batch_size) last_max = results.last['primary_key_column'] results = execute(query % { last_max: last_max }).to_a end end in_batches_of(1000) do |batch| # Do whatever needs to be done with the `batch` array here end，再次运行查询，然后转到步骤（4） 即可。

非常简单，可以用这样的方式实现：

primary_key_column

当然，OFFSET和朋友已被真实价值所取代。

如果您的查询中没有主键，那么您可以使用其他一些排序很好并且足够独特以满足您需求的列。您也可以使用<div><a src="What I need" data-src="What I don't need">Demo</a></div>子句而不是主键，但对于大型结果集，这可能会变得昂贵。

批量处理pgSQL查询结果

1 个答案: