如果它是avro,orc或parquet表,则可以使用相应的库来获取架构。 但是,如果输入/输出格式为TXT,并且数据存储在csv文件中,那么我该如何以编程方式获取模式?
谢谢
答案 0 :(得分:0)
您可以使用DESCRIBE
语句来显示有关表的元数据,例如列名及其数据类型。
DESCRIBE FORMATTED
以Apache Hive用户熟悉的格式显示其他信息。
示例:
我创建了一个如下表。
CREATE TABLE IF NOT EXISTS Employee_Local( EmployeeId INT,Name STRING,
Designation STRING,State STRING, Number STRING)
ROW Format Delimited Fields Terminated by ',' STORED AS Textfile;
DESCRIBE声明
您可以在DESCRIBE语句中使用缩写DESC。
hive> DESCRIBE Employee_Local;
OK
employeeid int
name string
designation string
state string
number string
DESCRIBE FORMATTED语句
hive> describe formatted Employee_Local;
OK
# col_name data_type comment
employeeid int
name string
designation string
state string
number string
# Detailed Table Information
Database: default
Owner: cloudera
CreateTime: Fri Mar 15 10:53:35 PDT 2019
LastAccessTime: UNKNOWN
Protect Mode: None
Retention: 0
Location: hdfs://quickstart.cloudera:8020/user/hive/warehouse/employee_test
Table Type: MANAGED_TABLE
Table Parameters:
transient_lastDdlTime 1552672415
# Storage Information
SerDe Library: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
InputFormat: org.apache.hadoop.mapred.TextInputFormat
OutputFormat: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
Compressed: No
Num Buckets: -1
Bucket Columns: []
Sort Columns: []
Storage Desc Params:
field.delim ,
serialization.format ,
Time taken: 0.544 seconds, Fetched: 31 row(s)
即使您可以从Spark Shell获取Hive表的架构,如下所示:
scala> spark.sql("desc formatted test_loop").collect().foreach(println)
[policyid,bigint,null]
[statecode,string,null]
[county,string,null]
[eq_site_limit,bigint,null]
[hu_site_limit,bigint,null]
[fl_site_limit,bigint,null]
[fr_site_limit,bigint,null]
[tiv_2011,bigint,null]
[tiv_2012,double,null]
[eq_site_deductible,double,null]
[hu_site_deductible,double,null]
[fl_site_deductible,double,null]
[fr_site_deductible,double,null]
[point_latitude,double,null]
[point_longitude,double,null]
[line,string,null]
[construction,string,null]
[point_granularity,bigint,null]
[,,]
[# Detailed Table Information,,]
[Database:,default,]
[Owner:,mapr,]
[Create Time:,Fri May 26 17:56:04 EDT 2017,]
[Last Access Time:,Wed Dec 31 19:00:00 EST 1969,]
[Location:,maprfs:/user/hv2/warehouse/test_loop,]
[Table Type:,MANAGED,]
[Table Parameters:,,]
[ rawDataSize,254192494,]
[ numFiles,1,]
[ transient_lastDdlTime,1495845784,]
[ totalSize,251167564,]
[ numRows,3024360,]
[,,]
[# Storage Information,,]
[SerDe Library:,org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe,]
[InputFormat:,org.apache.hadoop.mapred.TextInputFormat,]
[OutputFormat:,org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat,]
[Compressed:,No,]
[Storage Desc Parameters:,,]
[ serialization.format,1,]