虽然在csv格式com.databricks.spark.csv
中使用spark在蜂巢中创建带有分区的外部表时效果很好,但是我无法从蜂巢壳中打开.csv
格式的蜂巢中创建的表< / p>
错误
hive> select * from output.candidatelist;
Failed with exception java.io.IOException:java.io.IOException: hdfs://10.19.2.190:8020/biometric/event=ABCD/LabName=500098A/part-00000-de39bb3d-0548-4db6-b8b7-bb57739327b4.c000.csv not a SequenceFile
代码:
val sparkDf = spark.read.format("com.databricks.spark.csv").option("header", "true").option("nullValue", "null").schema(StructType(Array(StructField("RollNo/SeatNo", StringType, true), StructField("LabName", StringType, true)))).option("multiLine", "true").option("mode", "DROPMALFORMED").load("hdfs://10.19.2.190:8020/biometric/SheduleData_3007_2018.csv")
sparkDf.write.mode(SaveMode.Overwrite).option("path", "hdfs://10.19.2.190:8020/biometric/event=ABCD/").partitionBy("LabName").format("com.databricks.spark.csv").saveAsTable("output.candidateList")
如何在csv中使用表格格式访问Hive shell中的表格
SHOW CREATE TABLE候选列表;
CREATE EXTERNAL TABLE `candidatelist`(
`col` array<string> COMMENT 'from deserializer')
PARTITIONED BY (
`centercode` string,
`examdate` date)
ROW FORMAT SERDE
'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
WITH SERDEPROPERTIES ('path'='hdfs://10.19.2.190:8020/biometric/output')
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.SequenceFileInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat'
LOCATION
'hdfs://nnuat.iot.com:8020/apps/hive/warehouse/sify_cvs_output.db/candidatelist-__PLACEHOLDER__'TBLPROPERTIES (
'spark.sql.create.version'='2.3.0.2.6.5.0-292',
'spark.sql.partitionProvider'='catalog',
'spark.sql.sources.provider'='com.databricks.spark.csv',
'spark.sql.sources.schema.numPartCols'='2',
'spark.sql.sources.schema.numParts'='1',
'spark.sql.sources.schema.part.0'='{\"type\":\"struct\",\"fields\":[{\"name\":\"RollNo/SeatNo\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}},{\"name\":\"LabName\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}},{\"name\":\"Student_Name\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}},{\"name\":\"ExamName\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}},{\"name\":\"ExamTime\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}},{\"name\":\"Center\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}},{\"name\":\"CenterCode\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}},{\"name\":\"ExamDate\",\"type\":\"date\",\"nullable\":true,\"metadata\":{}}]}',
'spark.sql.sources.schema.partCol.0'='CenterCode',
'spark.sql.sources.schema.partCol.1'='ExamDate',
'transient_lastDdlTime'='1535692379')