Question

当前设置： 1. HDInsight for Spark 2.用于HDFS存储的Azure Datalake Store

我正在尝试从位于Azure DLS上的分区将数据加载到配置单元表中，以便写入我正在使用此代码的分区位置 -

from pyspark import SparkContext
from pyspark.sql.context import SQLContext
from pyspark.sql.types import *
from pyspark.sql.functions import *
from pyspark.sql import HiveContext
sc = SparkContext(appName="Transformation").getOrCreate()
sql_context = SQLContext(sc)
sql_context.setConf("hive.exec.dynamic.partition", "true")
sql_context.setConf("hive.exec.dynamic.partition.mode", "nonstrict")
sql_context.setConf("hive.exec.max.dynamic.partitions.pernode", "400")
sql_context.setConf("spark.sql.hive.convertMetastoreParquet", "false")

df = sql_context.read.orc("adl://LOCATION_NAME") # current data in ORC format stored on ADLS
df.write.format("parquet").partitionBy('ReportId', 'ReportDeliveryDate').save('adl://LOCATION_NAME') # writing the data

我可以在该位置看到分区数据：

现在我在Hive中使用以下命令创建表：

DROP TABLE IF EXISTS default.test_tb;
CREATE EXTERNAL TABLE IF NOT EXISTS default.test_tb
(
A string,
B string,
)
PARTITIONED BY (ReportId STRING, ReportDeliveryDate STRING)
STORED AS PARQUET
LOCATION 'adl://LOCATION_NAME';

当我进行此查询时 - select * from test_tb没有输出。我尝试了许多方法，例如提供here的方法。不知道我在这里做错了什么。

---- ---更新 这里的问题不同，我与Azure团队进行了长时间的讨论，问题是HIVE不支持UpperCase分区，所以如果你在lowercase中spark创建它们然后运行它repair table command

Hive无法从Spark创建的分区中获取数据

0 个答案: