Question

需要帮助，请

我已经通过OIV工具下载了转换为分隔csv文件的fsimage。我还创建了一个hive表并将csv文件插入其中。

我对sql不太熟悉，因此很难查询数据。

例如：文件中的每条记录都是这样的：

/tmp/hive/ltonakanyan/9c01cc22-55ef-4410-9f55-614726869f6d/hive_2017-05-08_08-44-39_680_3710282255695385702-113/-mr-10000/.hive-staging_hive_2017-05-08_08-44-39_680_3710282255695385702- 113 / -ext-10001 / 000044_0.deflate | 3 | 2017-05-0808：45 | 2017-05-0808：45 | 134217728 | 1 | 176 | 0 | 0 | -rw-R ----- | ltonakanyan | HDFS

/data/lz/cpi/ofz/zd/cbt_ca_verint/new_data/2017-09-27/253018001769667.xml | 3 | 2017-09-2723：41 | 2017-09-2817：09 | 134217728 | 1 | 14549 | 0 | 0 | -rw-r ----- | bc55_ah_appid | HDFS

表格描述是：

| hdfspath |串
|复制| INT
|修改时间|串
|访问时间|串
| preferredblocksize | INT
| blockscount | INT
| filesize | BIGINT
| nsquota | BIGINT
| dsquota | BIGINT
| permissionx |串
| userx |串
| groupx |串

我需要知道如何仅使用filesize查询/ tmp，/ data然后转到第二级（/ tmp / hive）（/ data / lz），后续级别使用filesize

我创建了这样的东西： select substr（hdfspath，2，instr（substr（hdfspath，2），'/'） - 1）zone，总和（文件大小）从例子 group by substr（hdfspath，2，instr（substr（hdfspath，2），'/'） - 1）;

但它没有提供数据..文件大小都是以字节为单位。

Answer 1

select joinedpath, sumsize
from 
(
select joinedpath,round(sum(filesize)/1024/1024/1024,2) as sumsize
from
(select concat('/',split(hdfspath,'\/')[1]) as joinedpath,accesstime,filesize, userx 
from default.hdfs_meta_d
)t
where joinedpath != 'null'
group by joinedpath
)h

请检查上面的查询，它可以帮到你！

Answer 2

由于堆内存错误，此作业失败。在执行hdfs oiv命令之前尝试增加堆大小。

export HADOOP_OPTS="-Xmx4096m"

如果命令仍然失败，您可能需要将fsimage移动到具有更多内存并使用上述环境变量增加堆内存的其他计算机/服务器。

如何通过hive查询分析内容fsimage

2 个答案: