调用“插入覆盖”将产生以下警告
2018-08-29 13:52:00 WARN TrashPolicyDefault:141-无法创建垃圾桶目录:hdfs://nameservice1/user/XXXXX/.Trash/Current/data/table_1/key1=2 org.apache.hadoop.security.AccessControlException:权限被拒绝:user = XXXXX,access = EXECUTE,inode =“ / user / XXXXX / .Trash / Current / data / table_1 / key1 = 2”:hdfs:hdfs:drwx
问题:
代码示例
创建表格
spark.sql("CREATE EXTERNAL TABLE table_1 (id string, name string) PARTITIONED BY (key1 int) stored as parquet location 'hdfs://nameservice1/data/table_1'")
spark.sql("insert into table_1 values('a','a1', 1)").collect()
spark.sql("insert into table_1 values ('b','b2', 2)").collect()
spark.sql("select * from table_1").collect()
覆盖分区:
spark.sql("insert OVERWRITE table table_1 values ('b','b3', 2)").collect()
导致
[Row(id=u'a', name=u'a1', key1=1),
Row(id=u'b', name=u'b2', key1=2),
Row(id=u'b', name=u'b3', key1=2)]
答案 0 :(得分:1)
添加 PARTITION(列)。
val spark = SparkSession.builder.appName("test").config("hive.exec.dynamic.partition", "true").config("hive.exec.dynamic.partition.mode", "nonstrict").enableHiveSupport().getOrCreate
spark.sql("drop table table_1")
spark.sql("CREATE EXTERNAL TABLE table_1 (id string, name string) PARTITIONED BY (key1 int) stored as parquet location '/directory/your location/'")
spark.sql("insert into table_1 values('a','a1', 1)")
spark.sql("insert into table_1 values ('b','b2', 2)")
spark.sql("select * from table_1").show()
spark.sql("insert OVERWRITE table table_1 PARTITION(key1) values ('b','b3', 2)")
spark.sql("select * from table_1").show()