扫描的分区数(= 32767)超出限制

时间:2017-11-03 17:10:34

标签: hadoop hive partition

我试图使用Eel-sdk将数据流式传输到Hive。

val sink = HiveSink(testDBName, testTableName)
.withPartitionStrategy(new DynamicPartitionStrategy)

val hiveOps:HiveOps = ...
val schema = new StructType(Vector(Field("name", StringType),Field("pk", StringType),Field("pk1",a StringType)))

hiveOps.createTable( 
  testDBName,
  testTableName,
  schema,
  partitionKeys = Seq("pk", "pk1"),
  dialect = ParquetHiveDialect(),
  tableType = TableType.EXTERNAL_TABLE,
  overwrite = true
)
val items = Seq.tabulate(100)(i => TestData(i.toString, "42", "apple"))
val ds = DataStream(items)
ds.to(sink)

获取错误:扫描的分区数(= 32767)超出限制(= 10000)。 编号32767是2的幂......但仍然无法弄清楚出了什么问题。有什么想法吗?

1 个答案:

答案 0 :(得分:1)

Spark + Hive : Number of partitions scanned exceeds limit (=4000)

--conf "spark.sql.hive.convertMetastoreOrc=false"
--conf "spark.sql.hive.metastorePartitionPruning=false"