我的外部表格auto1_tracking_events_ext
已在列dt
上分区。
首先我执行:
SET hive.exec.dynamic.partition.mode=nonstrict;
SET hive.exec.dynamic.partition=true;
当我运行此查询时:
select count(*)
from auto1_tracking_events_ext
where dt = '2016-12-05';
它拾取分区,创建可能像3个映射器并在几秒钟内完成。
但是,如果我运行这个:
select count(*)
from auto1_tracking_events_ext
where dt = from_unixtime(unix_timestamp()-1*60*60*24, 'yyyy-MM-dd');
它不选择分区并启动413映射器并花费相当长的时间来计算。
在发布此问题时:
hive> select from_unixtime(unix_timestamp()-1*60*60*24, 'yyyy-MM-dd');
OK
2016-12-05
为什么Hive没有拿起分区?
更新:
将日期字符串作为hiveconf参数传递(如下所示)不也有帮助。
hive -hiveconf date_yesterday=$(date --date yesterday "+%Y-%m-%d")
hive> select count(*) from auto1_tracking_events_ext where dt = ${hiveconf:date_yesterday};
答案 0 :(得分:0)
如果第一个查询有效,你最后一个传递 hiveconf 变量的查询也应该有效,因为变量首先被替换,并且只有在该查询被执行之后。这是一个可能的错误,您没有引用变量。试试这个:
$ kubectl describe po two-containers-6d5f4b4d85-blxqj
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 5m22s default-scheduler Successfully assigned trading/two-containers-6d5f4b4d85-blxqj to minikube
Normal Created 4m32s (x4 over 5m19s) kubelet, minikube Created container init-myservice
Warning Failed 4m32s (x4 over 5m19s) kubelet, minikube Error: failed to start container "init-myservice": Error response from daemon: OCI runtime create failed: container_linux.go:344: starting container process caused "exec: \"sh -c 'sleep 60; /app/setup.sh'\": stat sh -c 'sleep 60; /app/setup.sh': no such file or directory": unknown
Normal Pulling 3m41s (x5 over 5m21s) kubelet, minikube Pulling image "alpine"
Normal Pulled 3m40s (x5 over 5m19s) kubelet, minikube Successfully pulled image "alpine"
Warning BackOff 10s (x23 over 5m1s) kubelet, minikube Back-off restarting failed container
没有引号,它会像这样解析 hive -hiveconf date_yesterday=$(date --date yesterday "+%Y-%m-%d")
hive> select count(*) from auto1_tracking_events_ext where dt = '${hiveconf:date_yesterday}'; --single quotes here
- 这是错误的,应该是单引号。
至于使用 where dt=2020-12-12
- the function is not deterministic and prevents proper query optimization。
改用 unix_timestamp()
或 current_date
:
current_timestamp