我一直在研究数据集business.json。我正在将所需的表提取到.parquet文件:
0: jdbc:drill:zk=local> use dfs.tmp;
0: jdbc:drill:zk=local> ALTER SESSION SET `store.format` = 'parquet';
运行我的命令后:
+-----------+----------------------------+
| Fragment | Number of records written |
+-----------+----------------------------+
| 0_0 | 3221419 |
+-----------+----------------------------+
1 row selected (276.773 seconds)
我正在分区.parquet文件:0_0_0.parquet,0_0_1.parquet,0_0_2.parquet
如何获得单个.parquet文件:0_0_0.parquet没有任何分区?
答案 0 :(得分:3)
因为你有很多行Drill并行执行。考虑调整以下配置选项[1]:
planner.slice_target
planner.width.max_per_node
planner.width.max_per_query
[1] https://drill.apache.org/docs/configuration-options-introduction/