如何在蜂巢中找到hh:mm:ss的平均值

时间:2019-03-06 10:26:22

标签: sql unix hadoop hive hiveql

考虑一下,我的蜂巢表的列为script_name,start_time,end_time,duration。开始时间,结束时间和持续时间的格式为hh:mm:ss。我的要求是找到最近7天这些列的平均时间并放入文件中。

1 个答案:

答案 0 :(得分:0)

转换为unix_timestamp,相加,除以3,转换为bigint,然后转换回HH:mm:ss:

from operator import setitem
from functools import reduce
up_to_m_of_n_true = lambda n, m: map(lambda inds: reduce(lambda a, ind: setitem(a, ind, True) or a,
                                                         inds, [False] * n),
                                     up_to_m_of_n(n, m))
# Example (output reformatted)
list(up_to_m_of_n_true(4,2))
[False, False, False, False]
[True, False, False, False]
[False, True, False, False]
[False, False, True, False]
[False, False, False, True]
[True, True, False, False]
[True, False, True, False]
[True, False, False, True]
[False, True, True, False]
[False, True, False, True]
[False, False, True, True]

结果:

def indices_to_boolean(n, inds):
  bools = [False] * n
  for ind in inds: bools[ind] = True
  return bools

def up_to_m_of_n_true(n, m):
  for inds in up_to_m_of_n(n, m):
    yield indices_to_boolean(inds, n)

在此处查看测试:http://demo.gethue.com/hue/editor?editor=285484&type=hive

对于单列:

转换为unix时间戳,以秒为单位计算平均值,转换为bigint(平均值为double,会有几分之一秒的精度损失),最后将其转换回字符串时间格式:

<style type="text/css">
    table {
        border-collapse: collapse;
        width: 100%;
    }

    td {
        border: 1px solid #d3d3d3;
        text-align: center;
        white-space: nowrap;
    }

    th {
        background-color: #0288D1;
        border: 2px solid #d3d3d3;
        text-align: center;
        font-size: large;
        color: white;
        text-transform: uppercase;
    }
</style>

<table>
    <thead>
        <tr>
            <th colspan="4" style="background-color:#0277BD"><strong>Some Text Here<strong></th></tr>
<tr>
<th><strong>Some Text Here</strong></th>
            <th><strong>Some Text Here</strong></th>
            <th><strong>Some Text Here</strong></th>
            <th></th>
        </tr>
    </thead>
    <tbody>

        <tr>
            <td>
                <a rel="nofollow" target="_blank" href="https://somesite.com/play"><img width="200" height="80" src="https://somesite.com/image.png" alt="Some Text Here"></a>
            </td>

            <td><strong><font color="green">Some Text Here</font></strong></td>
            <td>Some Text Here</td>

            <td>
                <div>
                    <button class="playblock" style="display:block;width:150px;height:50px;background-color:#4CAF50;margin-bottom:5px;color:white;font-size:20px;cursor:pointer;text-align:center;" onmouseover="this.style.backgroundColor='green'" onMouseOut="this.style.backgroundColor='#4CAF50'" onclick="window.location.href = 'https://somesitehere.com/play';">PLAY</button>
                </div>

                <div>
                    <button class="reviewblock" style="display:block;width:150px;height:50px;background-color:#EB9C12;color:white;font-size:20px;cursor:pointer;text-align:center;" onmouseover="this.style.backgroundColor='orange'" onMouseOut="this.style.backgroundColor='#EB9C12'" onclick="window.location.href = 'https://somesitehere.com/see/';">REVIEW</button>
                </div>
            </td>
        </tr>

结果:

with data as --Data example. Use your table instead
(select '12:10:30' start_time,'01:10:00' end_time, '02:10:00' duration)

select from_unixtime(cast((unix_timestamp(start_time,'HH:mm:ss')+ unix_timestamp(end_time,'HH:mm:ss')+unix_timestamp(duration,'HH:mm:ss'))/3 as bigint),'HH:mm:ss') from data;

在此处查看测试:http://demo.gethue.com/hue/editor?editor=285464&type=hive