Question

我有一个包含模式和索引的表，如下所示：

    create table stop_times
    (
    stop_times_id integer primary key generated by default as identity (start with 1, increment by 1),
    feed_id integer not null,
    trip_id varchar(256) not null,
    arrival_time char(8) not null,
    departure_time char(8) not null,
    stop_id varchar(256) not null,
    stop_sequence integer not null,
    stop_headsign varchar(256),
    pickup_type char(1) check (pickup_type in ('','0','1','2','3')) not null default '0',
    drop_off_type char(1) check (drop_off_type in ('','0','1','2','3')) not null default '0',
    shape_dist_traveled varchar(256),
    timepoint char(1) check (timepoint in ('','0','1'))
    );

    create index stop_times_trip_id on stop_times (trip_id,feed_id);
    create index stop_times_stop_id on stop_times (stop_id,feed_id);

每隔几天就会有一个新的Feed需要加载到表格中，一旦加载成功，旧的Feed就会被删除：

delete from stop_times where feed_id=old_feed_identifier;

和重建的索引：

  cs = db.prepareCall("CALL SYSCS_UTIL.SYSCS_COMPRESS_TABLE(?, ?, ?)");
  cs.setString(1, "APP");
  cs.setString(2, "stop_times");
  cs.setShort(3, (short) 1);
  cs.execute();
  cs.close();

通过逐行从逗号分隔的文本文件中读取数据来插入新的Feed。通过删除未转义的引号，无法识别的列并手动添加feed_id来清理每一行。然后使用准备好的insert语句将已清理的行插入 stop_times 表中。在插入开始时，关闭自动提交并每100次插入手动提交。

这是半夜完成的，适用于大多数Feed，因为整个过程只需要几分钟。然而，有一个我想要消费的饲料有2000万条记录。我在Ubuntu VM上看到的插入速度通常是每秒1000和5000条记录。这为20,000,000条记录增加了相当多的时间。

是否有任何好的调整来加快速度？

我确实尝试删除上面的两个索引并在之后重新创建它们。这使得性能提升了大约50％，这有助于但不是我希望获得的收益。

如何改善Derby数据库插入时间？

0 个答案: