使用索引进行MonetDB慢速查询

时间:2014-03-15 21:59:24

标签: indexing monetdb

我创建了一个10GB(60M记录)的表,在数据插入后手动添加了唯一索引(hidden_​​id)。 我有最简单的查询,但需要一分钟才能完成。

select hidden_id from netflow where hidden_id = 350000;

而且查询花了几十分钟而且#34; select * from netflow order by hidden_id limit 12500 offset 212500;"。 这真让我困惑。

我发布了下面第一个查询的分析。有什么线索为什么这么慢?

trace select hidden_id from netflow where hidden_id = 350000;

+----------+------------------------------------------------------------------+
| ticks    | stmt                                                             |
+==========+==================================================================+
|        3 | X_3 := sql.mvc();                                                |
|       15 | X_7=<tmp_2510>[69396995] := sql.bind(X_3=0,"sys","netflow","hidd |
:          : en_id",0);                                                       :
|      227 | X_4:bat[:oid,:oid] =<tmp_13332>[69396995] := sql.tid(X_3=0,"sys" |
:          : ,"netflow");                                                     :
| 72978741 | X_36=<tmp_4053>[1] := algebra.subselect(X_7=<tmp_2510>[69396995] |
:          : ,X_4=<tmp_13332>:bat[:oid,:oid][69396995],A0=350000:lng,A0=35000 :
:          : 0:lng,true,true,false);                                          :
|       17 | (X_10=<tmp_2175>[0],r1_10=<tmp_3416>[0]) := sql.bind(X_3=0,"sys" |
:          : ,"netflow","hidden_id",2);                                       :
|       14 | X_37=<tmp_13337>[0] := algebra.subselect(r1_10=<tmp_3416>[0],A0= |
:          : 350000:lng,A0=350000:lng,true,true,false);                       :
|        6 | X_13=<tmp_3416>[0] := sql.bind(X_3=0,"sys","netflow","hidden_id" |
:          : ,1);                                                             :
|       15 | X_38=<tmp_11053>[0] := algebra.subselect(X_13=<tmp_3416>[0],X_4= |
:          : <tmp_13332>:bat[:oid,:oid][69396995],A0=350000:lng,A0=350000:lng :
:          : ,true,true,false);                                               :
|        4 | X_15=<tmp_4053>[1] := sql.subdelta(X_36=<tmp_4053>[1],X_4=<tmp_1 |
:          : 3332>:bat[:oid,:oid][69396995],X_10=<tmp_2175>[0],X_37=<tmp_1333 :
:          : 7>[0],X_38=<tmp_11053>[0]);                                      :
|       20 | X_17=<tmp_11053>[1] := sql.projectdelta(X_15=<tmp_4053>[1],X_7=< |
:          : tmp_2510>[69396995],X_10=<tmp_2175>[0],r1_10=<tmp_3416>[0],X_13= :
:          : <tmp_3416>[0]);                                                  :
|        6 | X_18 := sql.resultSet(1,1,X_17=<tmp_11053>[1]);                  |
|        7 | sql.rsColumn(X_18=1,"sys.netflow","hidden_id","bigint",64,0,X_17 |
:          : =<tmp_11053>[1]);                                                :
|        2 | X_23 := io.stdout();                                             |
|       25 | sql.exportResult(X_23=="104d2":streams,X_18=1);                  |
|        1 | end s1_3;                                                        |
| 73011629 | X_5:void  := user.s1_3(350000:lng);                              |
+----------+------------------------------------------------------------------+

这是正在创建的表。

CREATE TABLE "netflow" (
    "time_seconds" double DEFAULT NULL,
    "parsed_date" timestamp DEFAULT NULL,
    "date_time_str" varchar(45) DEFAULT NULL,
    "ip_layer_protocol" bigint DEFAULT NULL,
    "ip_layer_protocol_code" varchar(45) DEFAULT NULL,
    "first_seen_src_ip" varchar(45) DEFAULT NULL,
    "first_seen_dest_ip" varchar(45) DEFAULT NULL,
    "first_seen_src_port" bigint DEFAULT NULL,
    "first_seen_dest_port" bigint DEFAULT NULL,
    "more_fragments" varchar(45) DEFAULT NULL,
    "cont_fragments" varchar(45) DEFAULT NULL,
    "duration_seconds" bigint DEFAULT NULL,
    "first_seen_src_payload_bytes" bigint DEFAULT NULL,
    "first_seen_dest_payload_bytes" bigint DEFAULT NULL,
    "first_seen_src_total_bytes" bigint DEFAULT NULL,
    "first_seen_dest_total_bytes" bigint DEFAULT NULL,
    "first_seen_src_packet_count" bigint DEFAULT NULL,
    "first_seen_dest_packet_count" bigint DEFAULT NULL,
    "record_force_out" varchar(45) DEFAULT NULL
);

更新

平台:没有并行的Windows 7

MonetDB版本:MonetDB 5服务器v11.15.19&#34; 2013年2月-SP6&#34;

存储中表格的描述:

select * from storage() where "table"='netflow';
+--------+---------+-------------------------------+-----------+----------+-----
-----+-----------+------------+------------+---------+--------+
| schema | table   | column                        | type      | location | count    | typewidth | columnsize | heapsize   | indices | sorted |
+========+=========+===============================+===========+==========+==========+===========+============+============+=========+========+
| sys    | netflow | time_seconds                  | double    | 17\1711  | 69396995 |         8 |  555175960 |          0 |       0 | false  |
| sys    | netflow | parsed_date                   | timestamp | 20\2054  | 69396995 |         8 |  555175960 |          0 |       0 | false  |
| sys    | netflow | date_time_str                 | varchar   | 07\734   | 69396995 |        21 |  277587980 | 2684354560 |       0 | false  |
| sys    | netflow | ip_layer_protocol             | bigint    | 62\6261  | 69396995 |         8 |  555175960 |          0 |       0 | false  |
| sys    | netflow | ip_layer_protocol_code        | varchar   | 62\6213  | 69396995 |         3 |   69396995 |     524288 |       0 | false  |
| sys    | netflow | first_seen_src_ip             | varchar   | 63\6342  | 69396995 |        11 |  138793990 |     524288 |       0 | false  |
| sys    | netflow | first_seen_dest_ip            | varchar   | 23\2324  | 69396995 |         8 |  138793990 |     524288 |       0 | false  |
| sys    | netflow | first_seen_src_port           | bigint    | 15\1574  | 69396995 |         8 |  555175960 |          0 |       0 | false  |
| sys    | netflow | first_seen_dest_port          | bigint    | 23\2370  | 69396995 |         8 |  555175960 |          0 |       0 | false  |
| sys    | netflow | more_fragments                | varchar   | 65\6521  | 69396995 |         1 |   69396995 |     524288 |       0 | false  |
| sys    | netflow | cont_fragments                | varchar   | 65\6524  | 69396995 |         1 |   69396995 |     524288 |       0 | false  |
| sys    | netflow | duration_seconds              | bigint    | 65\6560  | 69396995 |         8 |  555175960 |          0 |       0 | false  |
| sys    | netflow | first_seen_src_payload_bytes  | bigint    | 65\6561  | 69396995 |         8 |  555175960 |          0 |       0 | false  |
| sys    | netflow | first_seen_dest_payload_bytes | bigint    | 65\6562  | 69396995 |         8 |  555175960 |          0 |       0 | false  |
| sys    | netflow | first_seen_src_total_bytes    | bigint    | 65\6563  | 69396995 |         8 |  555175960 |          0 |       0 | false  |
| sys    | netflow | first_seen_dest_total_bytes   | bigint    | 65\6564  | 69396995 |         8 |  555175960 |          0 |       0 | false  |
| sys    | netflow | first_seen_src_packet_count   | bigint    | 65\6565  | 69396995 |         8 |  555175960 |          0 |       0 | false  |
| sys    | netflow | first_seen_dest_packet_count  | bigint    | 65\6566  | 69396995 |         8 |  555175960 |          0 |       0 | false  |
| sys    | netflow | record_force_out              | varchar   | 65\6567  | 69396995 |         1 |   69396995 |     524288 |       0 | false  |
| sys    | netflow | hidden_id                     | bigint    | 25\2510  | 69396995 |         8 |  555175960 |          0 |       0 | false  |
| sys    | netflow | index_id                      | oid       | 73\7375  | 69396995 |         8 |  555175960 |          0 |       0 | true   |
+--------+---------+-------------------------------+-----------+----------+----------+-----------+------------+------------+---------+--------+

1 个答案:

答案 0 :(得分:2)

请注意,在MonetDB中默认忽略SQL语句CREATE INDEX。 它将根据需要创建必要的哈希索引。 您的第一个查询会触发此操作。

我还注意到你似乎没有并行运行。 请始终指出所使用的平台和MonetDB版本。