我尝试使用下面的代码将外部配置单元表覆盖到分区内部表中。代码运行成功'但是当我运行' select * from videotracking_playevent limit 10'时,它永远不会返回任何结果。
外部表是从包含Parquet文件的递归文件夹目录生成的,可以查询。我已经测试了这个样本中的正则表达式,它也可以正常工作。 Hive日志不显示任何错误。 我有一种感觉,分区是什么以某种方式弄乱了它。我不明白为什么,有什么想法?
set hive.mapred.supports.subdirectories=true;
set hive.input.dir.recursive=true;
set hive.supports.subdirectories=true;
set mapred.input.dir.recursive=true;
set hive.execution.engine=spark;
set hive.exec.dynamic.partition.mode=nonstrict;
INSERT overwrite TABLE videotracking_playevent PARTITION (source, createyear, createmonth, createday)
SELECT
id_gigya,
created,
uid,
category,
action,
video_id,
program,
device,
url,
video_cms,
duration,
position,
version,
slot_type,
slot_position,
ad_position,
ad_duration,
player_type,
is_embed,
ad_max_ads,
ad_max_duration,
brand,
casting,
ip,
platform,
subprofile_id,
channel,
episode_id,
regexp_replace(regexp_extract(INPUT__FILE__NAME, 'source=[a-z]*', 0),'source=','') AS source,
regexp_replace(regexp_extract(INPUT__FILE__NAME, 'createyear=[0-9]*', 0),'createyear=','') AS createyear,
regexp_replace(regexp_extract(INPUT__FILE__NAME, 'createmonth=[0-9]*', 0),'createmonth=','') AS createmonth,
regexp_replace(regexp_extract(INPUT__FILE__NAME, 'createday=[0-9]*', 0),'createday=','') AS createday
FROM
videotracking_playevent_ext;