我正在使用Yarn作为资源管理器和Tez作为执行引擎,将数据从另一个大小为6.29GB(块大小:128MB)的Hive表写入Hive中的表。纱线设置是。
yarn.nodemanager.resource.memory-mb=8192
yarn.scheduler.minimum-allocation-mb=1024
yarn.scheduler.maximum-allocation-mb=8192
插入查询结束后,日志将为单个纱线容器显示以下信息
{
counterGroupName:org.apache.tez.common.counters.FileSystemCounter,
counterGroupDisplayName:File System Counters,
counters:[{ counterName:HDFS_BYTES_READ, counterValue:536887296},
{ counterName:HDFS_BYTES_WRITTEN, counterValue:107265498 },
{ counterName:HDFS_READ_OPS, counterValue:7 },
{ counterName:HDFS_WRITE_OPS, counterValue:3 } ]
}
{
counterGroupName:org.apache.tez.common.counters.TaskCounter,
counters:[{counterName:GC_TIME_MILLIS,counterValue:5450},
{counterName:CPU_MILLISECONDS,counterValue:97670},
{counterName:PHYSICAL_MEMORY_BYTES,counterValue:166723584},
{counterName:VIRTUAL_MEMORY_BYTES,counterValue:1968496640},
{counterName:COMMITTED_HEAP_BYTES,counterValue:166723584},
{counterName:INPUT_RECORDS_PROCESSED,counterValue:1736321},
{counterName:INPUT_SPLIT_LENGTH_BYTES,counterValue:536870912}]
}
我不确定Hive如何决定从hdfs读取多少数据,
1)它是如何决定读取单个容器的537 MB的?
2)源表块大小为128MB,可以假设(4 * 128 = 512MB),HDFS_READ_OPS为4,但HDFS_READ_OPS = 7?
Physical memory used = 167 MB
Virtual memory used = 1969 MB
HDFS_BYTES_READ = 537 MB