Spark,无法获得带有分区的创建表,但是没有分区有效

时间:2018-07-12 19:18:28

标签: sql apache-spark hive apache-spark-sql databricks

我的数据结构就是这样

/mnt/path/db/table/keya=01/keyb=123
/mnt/path/db/table/keya=01/keyb=124
/mnt/path/db/table/keya=02/keyb=123

此表创建成功

CREATE EXTERNAL TABLE `test_table_a`(
..irrelevant schema..
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\u0001'
STORED AS INPUTFORMAT
  'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
  '/mnt/path/db/table/keya=0101/keyb=123'

然后

select count(*) from test_table_a;
//returns
1876 
//correct

我可以查询数据,但是我想要一个分区表。

我已经尝试过

CREATE EXTERNAL TABLE `test_table_a`(
..irrelevant schema..
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\u0001'
STORED AS INPUTFORMAT
  'org.apache.hadoop.mapred.TextInputFormat'
PARTITIONED BY (
  `keya` string,
  `keyb` string)
OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
  '/mnt/path/db/table'

还有这个

CREATE EXTERNAL TABLE `test_table_a`(
..irrelevant schema..
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\u0001'
STORED AS INPUTFORMAT
  'org.apache.hadoop.mapred.TextInputFormat'
PARTITIONED BY (
  `keya` string,
  `keyb` string)
OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
  '/mnt/path/db/table/keya=*/keyb=*'

但是,在这两个方面我都得到了这个结果

select count(*) from test_table_a;
//returns
0 
show partitions test_table_a;
//returns
//nothing

1 个答案:

答案 0 :(得分:0)

位置上创建外部分区表时(该位置已经存在数据),因此在配置单元外壳中执行以下命令

hive> msck repair table <db.name>.<table_name>;

检查您是否能够在 test_table_a 表中查看分区信息和数据。

  

创建表语句:

CREATE EXTERNAL TABLE `test_table_a`(
..irrelevant schema..
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\u0001'
STORED AS INPUTFORMAT
  'org.apache.hadoop.mapred.TextInputFormat'
PARTITIONED BY (
  `keya` string,
  `keyb` string)
OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
  '/mnt/path/db/table';