Question

Spark版本：1.6.1
HDP：2.4.2

我在Hive中有一个带有ORC格式的hive表。

CREATE TABLE `test01_orc`(
  `c1` int, 
  `c2` string)
ROW FORMAT SERDE 
  'org.apache.hadoop.hive.ql.io.orc.OrcSerde' 
STORED AS INPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' 
OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
 .......

有用的代码：

SparkConf conf = new SparkConf(true).setMaster("yarn-cluster").setAppName("DCA_HIVE_HDFS");
SparkContext sc = new SparkContext(conf);
HiveContext hc = new HiveContext(sc);
DataFrame df = hc.table("testdb.test01_orc");
df.printSchema();

输出：

 root
  |-- _col0: integer (nullable = true)
  |-- _col1: string (nullable = true)

_col0和_col1作为列名，而我已经将其称为c1和c2

如果hive表格中包含其他格式，如TEXTFILE，SEQUENCEFILE，RCFILE，PARQUET和AVRO：

root
  |-- c1: integer (nullable = true)
  |-- c2: string (nullable = true)

这是一个错误还是需要进行任何配置更改？

在从ORC配置单元表获得的数据帧的情况下错误的模式

0 个答案: