Question

我有一个包含超过300万个以管道分隔的行的文件，我想将其插入到数据库中。它是一个简单的表（不需要规范化）

设置查看文件的路径，使用流模式读取并分割线条很容易。将行插入表格也是一个简单的布线工作。

问题是：如何使用批量插入执行此操作？让我们说1000行是最佳的..鉴于文件是流式传输的，SQL组件如何知道流已经完成。可以说该文件有3,000,001条记录。如何设置Camel以插入最后的杂散记录？

可以一次插入一行 - 但这将非常慢。

Answer 1

我会推荐这样的东西：

from("file:....")
    .split("\n").streaming()
        .to("any work for individual level")
        .aggregate(body(), new MyAggregationStrategy().completionSize(1000).completionTimeout(50)
            .to(sql:......);

我没有验证所有语法，但计划是抓取文件将其拆分为流，然后聚合1000组并超时捕获最后一个较小的组。这些聚合组可以简单地使主体成为字符串列表或您的批量sql插入所需的任何格式。

Answer 2

这可以使用Camel-Spring-batch组件完成。 http://camel.apache.org/springbatch.html，每步的提交量可以由commitInterval定义，并且作业的编排在spring配置中定义。对于与您的要求类似的用例，它非常适用。这是github的一个很好的例子：https://github.com/hekonsek/fuse-pocs/tree/master/fuse-pocs-springdm-springbatch/fuse-pocs-springdm-springbatch-bundle/src/main

Answer 3

以下是更准确的示例：

@Component
@Slf4j
public class SQLRoute extends RouteBuilder {

  @Autowired
  ListAggregationStrategy aggregationStrategy;

  @Override
  public void configure() throws Exception {
    from("timer://runOnce?repeatCount=1&delay=0")
        .to("sql:classpath:sql/orders.sql?outputType=StreamList")
        .split(body()).streaming()
          .aggregate(constant(1), aggregationStrategy).completionSize(1000).completionTimeout(500)
            .to("log:batch")
            .to("google-bigquery:google_project:import:orders")
          .end()
        .end();
  }

  @Component
  class ListAggregationStrategy implements AggregationStrategy {

    public Exchange aggregate(Exchange oldExchange, Exchange newExchange) {
      List rows = null;

      if (oldExchange == null) {
        // First row ->
        rows = new LinkedList();
        rows.add(newExchange.getMessage().getBody());
        newExchange.getMessage().setBody(rows);
        return newExchange;
      }

      rows = oldExchange.getIn().getBody(List.class);
      Map newRow = newExchange.getIn().getBody(Map.class);

      log.debug("Current rows count: {} ", rows.size());
      log.debug("Adding new row: {}", newRow);

      rows.add(newRow);
      oldExchange.getIn().setBody(rows);

      return oldExchange;
    }
  }
}

如何在Apache Camel中设置一组流式SQL插件

3 个答案: