Question

我一直在研究数据集business.json。我正在将所需的表提取到.parquet文件：

0: jdbc:drill:zk=local> use dfs.tmp;
0: jdbc:drill:zk=local> ALTER SESSION SET `store.format` = 'parquet';

运行我的命令后：

+-----------+----------------------------+
| Fragment  | Number of records written  |
+-----------+----------------------------+
| 0_0       | 3221419                    |
+-----------+----------------------------+
1 row selected (276.773 seconds)

我正在分区.parquet文件：0_0_0.parquet，0_0_1.parquet，0_0_2.parquet

如何获得单个.parquet文件：0_0_0.parquet没有任何分区？

Answer 1

因为你有很多行Drill并行执行。考虑调整以下配置选项[1]：

planner.slice_target
planner.width.max_per_node
planner.width.max_per_query

[1] https://drill.apache.org/docs/configuration-options-introduction/

在Apache Drill中将.json数据集转换为.parquet而不包含任何分区

1 个答案: