Hive - 从时间戳检索日期时to_date vs substr

时间:2018-02-11 17:39:49

标签: hive

我有一个带有数据类型字符串的时间戳列。数据格式为“yyyy-mm-dd hh:mm:ss”。 我有两个解决方案只能检索日期部分。

  1. TO_DATE(COL)
  2. SUBSTR(COL,0,10)
  3. 在性能方面,哪一项是针对庞大数据量的更好解决方案?

1 个答案:

答案 0 :(得分:0)

我认为你的问题的答案取决于很多事情,但总的来说,查看解释计划是一个很好的起点。在我的测试中,计划似乎没有差异。

注意:这是在Hive版本1.1.0-cdh5.12.2上的Cloudera环境中测试的

使用TO_DATE():

+----------------------------------------------------+--+
|                      Explain                       |
+----------------------------------------------------+--+
| STAGE DEPENDENCIES:                                |
|   Stage-1 is a root stage                          |
|   Stage-0 depends on stages: Stage-1               |
|                                                    |
| STAGE PLANS:                                       |
|   Stage: Stage-1                                   |
|     Map Reduce                                     |
|       Map Operator Tree:                           |
|           TableScan                                |
|             alias: a                               |
|             Statistics: Num rows: 163043612 Data size: 178714012511 Basic stats: COMPLETE Column stats: NONE |
|             Select Operator                        |
|               expressions: to_date(some_date) (type: string) |
|               outputColumnNames: _col0             |
|               Statistics: Num rows: 163043612 Data size: 178714012511 Basic stats: COMPLETE Column stats: NONE |
|               File Output Operator                 |
|                 compressed: false                  |
|                 Statistics: Num rows: 163043612 Data size: 178714012511 Basic stats: COMPLETE Column stats: NONE |
|                 table:                             |
|                     input format: org.apache.hadoop.mapred.TextInputFormat |
|                     output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat |
|                     serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe |
|                                                    |
|   Stage: Stage-0                                   |
|     Fetch Operator                                 |
|       limit: -1                                    |
|       Processor Tree:                              |
|         ListSink                                   |
|                                                    |
+----------------------------------------------------+--+

使用SUBSTR():

+----------------------------------------------------+--+
|                      Explain                       |
+----------------------------------------------------+--+
| STAGE DEPENDENCIES:                                |
|   Stage-1 is a root stage                          |
|   Stage-0 depends on stages: Stage-1               |
|                                                    |
| STAGE PLANS:                                       |
|   Stage: Stage-1                                   |
|     Map Reduce                                     |
|       Map Operator Tree:                           |
|           TableScan                                |
|             alias: b                               |
|             Statistics: Num rows: 163043612 Data size: 178714012511 Basic stats: COMPLETE Column stats: NONE |
|             Select Operator                        |
|               expressions: substr(some_date, 1, 10) (type: string) |
|               outputColumnNames: _col0             |
|               Statistics: Num rows: 163043612 Data size: 178714012511 Basic stats: COMPLETE Column stats: NONE |
|               File Output Operator                 |
|                 compressed: false                  |
|                 Statistics: Num rows: 163043612 Data size: 178714012511 Basic stats: COMPLETE Column stats: NONE |
|                 table:                             |
|                     input format: org.apache.hadoop.mapred.TextInputFormat |
|                     output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat |
|                     serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe |
|                                                    |
|   Stage: Stage-0                                   |
|     Fetch Operator                                 |
|       limit: -1                                    |
|       Processor Tree:                              |
|         ListSink                                   |
|                                                    |
+----------------------------------------------------+--+