当我在impala中执行sql时,我得到了这个信息:
无法处理大于IO大小的行(row_size = 13.42 MB,null_indicators_size = 0)。要运行此查询,请增加IO大小(--read_size选项)。
解释如下:
06:SORT
| order by: count(*) DESC
| hosts=1 per-host-mem=unavailable
| tuple-ids=7 row-size=24B cardinality=30000000
|
05:AGGREGATE [FINALIZE]
| output: count(*)
| group by: group_concat(host)
| having: count(*) > 10
| hosts=1 per-host-mem=unavailable
| tuple-ids=6 row-size=24B cardinality=30000000
|
04:AGGREGATE [FINALIZE]
| output: group_concat(host)
| group by: gridsum_id
| hosts=1 per-host-mem=unavailable
| tuple-ids=4 row-size=31B cardinality=30000000
|
08:MERGING-EXCHANGE [UNPARTITIONED]
| order by: g_id ASC, server_time ASC, session_order ASC
| limit: 30000000
| hosts=1 per-host-mem=unavailable
| tuple-ids=2 row-size=46B cardinality=30000000
|
03:TOP-N [LIMIT=30000000]
| order by: g_id ASC, server_time ASC, session_order ASC
| hosts=1 per-host-mem=1.29GB
| tuple-ids=2 row-size=46B cardinality=30000000
|
02:HASH JOIN [INNER JOIN, BROADCAST]
| hash predicates: b.g_id = r.g_id
| runtime filters: RF000 <- r.g_id
| hosts=1 per-host-mem=2.00GB
| tuple-ids=1,0 row-size=65B cardinality=unavailable
|
|--07:EXCHANGE [BROADCAST]
| | hosts=18 per-host-mem=0B
| | tuple-ids=0 row-size=46B cardinality=unavailable
| |
| 00:SCAN HDFS [u_g.botao_route_all r, RANDOM]
| partitions=1/1 files=18 size=213.24MB
| predicates: r.host NOT IN ('-', '(lost)'), r.session_order > 0
| table stats: unavailable
| column stats: unavailable
| hosts=18 per-host-mem=96.00MB
| tuple-ids=0 row-size=46B cardinality=unavailable
|
01:SCAN HDFS [u_g.botao_id b, RANDOM]
partitions=1/1 files=1 size=5.53MB
predicates: b.profile_id = 2473
runtime filters: RF000 -> b.g_id
table stats: 160891 rows total
column stats: unavailable
hosts=1 per-host-mem=32.00MB
tuple-ids=1 row-size=19B cardinality=16089
----------------
任何人都可以帮助我,非常感谢。
答案 0 :(得分:0)
由于内存不足和溢出IO缓冲区大小有限,您正在进行此操作。
Status BufferedTupleStream::NewBlockForWrite(int min_size, bool* got_block) {
DCHECK(!closed_);
if (min_size > block_mgr_->max_block_size()) {
return Status(Substitute("Cannot process row that is bigger than the IO size "
"(row_size=$0). To run this query, increase the io size (--read_size option).",
PrettyPrinter::Print(min_size, TCounterType::BYTES)));
}
当溢出发生时,Impala需要一次一行地逐行编写中间元组,这要求IO缓冲区足够大以至少容纳一行。在您的情况下,不满足此条件,导致上述错误。
您可以使用更大的内存运行查询,也可以通过--read_size
选项调整块大小,但在这种情况下这是反直觉的。