我想将java rdd存储为带有每小时分区的序列文件。有什么办法可以实现这个目的吗?
例如:
我有类型记录:
time,a1,a2,a3,a4,a5,a6,a7,a8
我想将密钥作为a2,a3,a4和值作为此密钥中的所有值以及按时分区的列。
在hdfs中它将存储为::
output/time=12345/sequence_file_of_key_and_values
Sample input:
1486203462,1,45,66,77,ansh,72,976,58
1486203461,1,452,66,77,ansh5,456,8754,09865
1486203462,1,45,66,77,ansh9,772,976,5890
1486203461,1,452,66,77,ansh156,742,96,5951
输出就像:
output/time=1486203462/a sequence file with key as (1,45,66,77) and corresponding values as ((1486203462,1,45,66,77,ansh,72,976,58),
1486203462,1,45,66,77,ansh9,772,976,5890))
output/time=1486203461/a sequence file with key as (1,452,66,77) and corresponding values as ((1486203461,1,452,66,77,ansh5,456,8754,09865),(1486203461,1,452,66,77,ansh156,742,96,5951))