Question

我编写了一些PostgreSQL数据库客户端代码来更新中央数据库，其中包含来自多个客户端的IP地址和主机名表。有两个表：一个用于保存IP地址和主机名之间的映射，另一个用于保存尚未解析为主机名的IP地址队列。

这是IP地址到主机名的映射表：

CREATE TABLE g_hostmap(
    appliance_id     INTEGER,
    ip               INET,
    fqdn             TEXT,
    resolve_time     TIMESTAMP, 
    expire_time      TIMESTAMP,
    UNIQUE(appliance_id, ip))

这是工作队列表：

CREATE TABLE g_hostmap_work(
    ip               INET,
    input_table      TEXT)

数据库客户端每个都从单个工作队列表中提取请求。每个请求都包含一个请求主机名的私有IPv4地址。

工作流程如下：每个客户端定期查询中央数据库工作队列以获取需要主机名的IP地址列表，对地址执行反向DNS查找，然后更新主机名表使用（IP地址，主机名）对，一次一个。我希望通过尝试同时解析相同的IP地址来最小化多个客户端重复工作的可能性。

我将每批更新限制为10行中的较大行或行中工作队列大小的10％。客户的时间有点独立。如何在更新过程中进一步减少DNS名称服务器和主机名表的争用？我的客户担心会有很多重复工作。

以下是工作队列中项目计数的初始查询：

SELECT COUNT(*)
       FROM g_hostmap_work queued
       LEFT JOIN g_hostmap cached
            ON queued.ip = cached.ip
            AND now() < cached.expire_time

以下是返回工作队列中项目子集的查询：

SELECT queued.ip, queued.input_table, cached.expire_time
       FROM g_hostmap_work queued
       LEFT JOIN g_hostmap cached
            ON queued.ip = cached.ip
            AND now() < cached.expire_time
       LIMIT 10

以下是使用新IP地址/主机名映射更新数据库的单个INSERT语句的示例：

INSERT INTO g_hostmap_20131230 VALUES
(NULL, '192.168.54.133', 'powwow.site', now(), now() + 900 * INTERVAL '1 SECOND')

Answer 1

我会提出一个奇怪的声音建议。将auto-inc big int添加到源表，并使用模除法创建一组10个索引。这是一个简单的测试用例：

create table queue (id bigserial, input text);
create index q0 on queue (id) where id%10=0;
create index q1 on queue (id) where id%10=1;
create index q2 on queue (id) where id%10=2;
create index q3 on queue (id) where id%10=3;
create index q4 on queue (id) where id%10=4;
create index q5 on queue (id) where id%10=5;
create index q6 on queue (id) where id%10=6;
create index q7 on queue (id) where id%10=7;
create index q8 on queue (id) where id%10=8;
create index q9 on queue (id) where id%10=9;
insert into queue select generate_series(1,50000),'this';

我们在这里所做的是创建一组索引，这些索引的索引是表的十分之一。接下来，我们将选择其中一个范围的块来处理：

begin;
select * from queue where id%10=0 limit 100 for update;
id  | input 
------+-------
10 | this
20 | this
30 | this
-- do work here --
commit;

现在有趣的部分。如果你有＆gt;使用此设置的10名工作人员，您只需循环查看数字，当上述选择更新运行时，任何超过10名工作人员都会等待。但任何其他数字（1到9）仍然有效。

begin;
select * from queue where id%10=1 limit 100 for update;
 id  | input 
-----+-------
   1 | this
  11 | this
  21 | this
  31 | this
-- do work here
commit;

这种方式将所有工作分成10个桶。想要更多水桶？更改％后面的数字并增加要匹配的索引数。

如何最大限度地减少更新期间数据库争用的可能性

1 个答案: