Question

我在HDP设置上面临这个问题，对于事务表来说，只需要一次COMPACTION就可以使用Spark SQL获取记录。另一方面，Apache安装程序甚至不需要压缩一次。

可能是压缩后在元存储上触发的东西，Spark SQL开始识别delta文件。

如果需要其他细节以便了解我，请告诉我。

试试这个，

查看完整方案：

hive> create table default.foo(id int) clustered by (id) into 2 buckets STORED AS ORC TBLPROPERTIES ('transactional'='true');
hive> insert into default.foo values(10);

scala> sqlContext.table("default.foo").count // Gives 0, which is wrong because data is still in delta files

#Now run major compaction:

hive> ALTER TABLE default.foo COMPACT 'MAJOR';

scala> sqlContext.table("default.foo").count // Gives 1

hive> insert into foo values(20);

scala> sqlContext.table("default.foo").count // Gives 2 , no compaction required.

Answer 1

Spark不支持hive跨国表格的任何功能。

请检查：https://issues.apache.org/jira/browse/SPARK-15348

Spark SQL不会返回HDP上的HIVE事务表的记录

1 个答案: