我有一个hive表'driver_time_stats',列有slot_id,number_of_drivers,slot_start_time和slot_end_time。
-----------------------------------------------------------------------
slot_id | number_of_drivers | slot_start_time | slot_end_time
-----------------------------------------------------------------------
1 | 5 | 2018-01-01 09:30:00 | 2018-01-01 10:00:00
2 | 8 | 2018-01-01 10:30:00 | 2018-01-01 11:00:00
-----------------------------------------------------------------------
所需的输出:每行应该在slot_start_time和amp;之间以1分钟的间隔分成多行。 slot_end_time。
-----------------------------------------------------------------------
slot_id | number_of_drivers | slot_start_time | slot_end_time
-----------------------------------------------------------------------
1 | 5 | 2018-01-01 09:30:00 | 2018-01-01 09:31:00
1 | 5 | 2018-01-01 09:31:00 | 2018-01-01 09:32:00
.
.
.
1 | 5 | 2018-01-01 09:59:00 | 2018-01-01 10:00:00
2 | 8 | 2018-01-01 10:30:00 | 2018-01-01 10:31:00
2 | 8 | 2018-01-01 10:31:00 | 2018-01-01 10:32:00
.
.
.
2 | 8 | 2018-01-01 10:59:00 | 2018-01-01 11:00:00
-----------------------------------------------------------------------
我使用侧视图,posexplode e.t.c函数但无法做到。有人能帮我一下吗 ?换句话说,我试图在hive中以一分钟的间隔将记录切成多个记录。我能够使用UNNEST在presto中实现它,但是我希望hive中的解决方案只能作为构建在hive上的ETL。
-Nash
答案 0 :(得分:0)
class FooType(type):
def __new__(meta, name, bases, attrs):
if "_instances" not in attrs:
attrs["_instances"] = dict()
return type.__new__(meta, name, bases, attrs)
def __call__(cls, param):
if param not in cls._instances:
cls._instances[param] = super(FooType, cls).__call__(param)
return cls._instances[param]
class Foo(metaclass=FooType):
def __init__(self, param):
self._param = param
print("%s init(%s)" % (self, param))
def __repr__(self):
return "{}<{},{}>".format(self.__class__.__name__, self._param, id(self))
class Bar(Foo):
pass
f1,f2,f3 = [Foo(i) for i in (0,0,1)]
print([f1,f2,f3])
b1,b2,b3 = [Bar(i) for i in (0,0,1)]
print([b1,b2,b3])