Hive不会使用计算出的分区键来获取分区

时间:2016-12-06 12:12:04

标签: hive hiveql

我的外部表格auto1_tracking_events_ext已在列dt上分区。

首先我执行:

SET hive.exec.dynamic.partition.mode=nonstrict;
SET hive.exec.dynamic.partition=true;

当我运行此查询时:

select count(*)
from auto1_tracking_events_ext
where dt = '2016-12-05';

它拾取分区,创建可能像3个映射器并在几秒钟内完成。

但是,如果我运行这个:

select count(*)
from auto1_tracking_events_ext
where dt = from_unixtime(unix_timestamp()-1*60*60*24, 'yyyy-MM-dd');

选择分区并启动413映射器并花费相当长的时间来计算。

在发布此问题时:

hive> select from_unixtime(unix_timestamp()-1*60*60*24, 'yyyy-MM-dd');
OK
2016-12-05

为什么Hive没有拿起分区?

更新:

将日期字符串作为hiveconf参数传递(如下所示)也有帮助。

hive -hiveconf date_yesterday=$(date --date yesterday "+%Y-%m-%d")
hive> select count(*) from auto1_tracking_events_ext where dt = ${hiveconf:date_yesterday};

1 个答案:

答案 0 :(得分:0)

如果第一个查询有效,你最后一个传递 hiveconf 变量的查询也应该有效,因为变量首先被替换,并且只有在该查询被执行之后。这是一个可能的错误,您没有引用变量。试试这个:

$ kubectl describe po two-containers-6d5f4b4d85-blxqj

Events:
  Type     Reason     Age                    From               Message
  ----     ------     ----                   ----               -------
  Normal   Scheduled  5m22s                  default-scheduler  Successfully assigned trading/two-containers-6d5f4b4d85-blxqj to minikube
  Normal   Created    4m32s (x4 over 5m19s)  kubelet, minikube  Created container init-myservice
  Warning  Failed     4m32s (x4 over 5m19s)  kubelet, minikube  Error: failed to start container "init-myservice": Error response from daemon: OCI runtime create failed: container_linux.go:344: starting container process caused "exec: \"sh -c 'sleep 60; /app/setup.sh'\": stat sh -c 'sleep 60; /app/setup.sh': no such file or directory": unknown
  Normal   Pulling    3m41s (x5 over 5m21s)  kubelet, minikube  Pulling image "alpine"
  Normal   Pulled     3m40s (x5 over 5m19s)  kubelet, minikube  Successfully pulled image "alpine"
  Warning  BackOff    10s (x23 over 5m1s)    kubelet, minikube  Back-off restarting failed container

没有引号,它会像这样解析 hive -hiveconf date_yesterday=$(date --date yesterday "+%Y-%m-%d") hive> select count(*) from auto1_tracking_events_ext where dt = '${hiveconf:date_yesterday}'; --single quotes here - 这是错误的,应该是单引号。

至于使用 where dt=2020-12-12 - the function is not deterministic and prevents proper query optimization

改用 unix_timestamp()current_date

current_timestamp