我开始使用Apache kylin(版本1.5.3)。创建立方体时,我在步骤5'保存长方体统计'时收到错误。日志说:
java.lang.IllegalArgumentException: KeyValue size too large
at org.apache.hadoop.hbase.client.HTable.validatePut(HTable.java:1521)
at org.apache.hadoop.hbase.client.BufferedMutatorImpl.validatePut(BufferedMutatorImpl.java:147)
at org.apache.hadoop.hbase.client.BufferedMutatorImpl.doMutate(BufferedMutatorImpl.java:134)
at org.apache.hadoop.hbase.client.BufferedMutatorImpl.mutate(BufferedMutatorImpl.java:98)
at org.apache.hadoop.hbase.client.HTable.put(HTable.java:1038)
at org.apache.kylin.storage.hbase.HBaseResourceStore.putResourceImpl(HBaseResourceStore.java:242)
at org.apache.kylin.common.persistence.ResourceStore.putResource(ResourceStore.java:208)
at org.apache.kylin.engine.mr.steps.SaveStatisticsStep.doWork(SaveStatisticsStep.java:113)
at org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:112)
at org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:57)
at org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:112)
at org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:127)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
首先,我尝试使用较少的维度创建相同的多维数据集并且它可以工作。使用左侧尺寸创建antoher立方体也可以。但是当我尝试创建一个包含所有这些(13)维度的多维数据集时,它就会失败。 我也厌倦了将 hbase.client.keyvalue.maxsize 设置为0来禁用检查。仍然是同样的错误。
有谁知道问题是什么以及如何解决?
顺便说一句,我在Sandbox HDP 2.4上使用kylin。
提前感谢您的帮助
索伦
答案 0 :(得分:0)
确保kylin.hbase.client.keyvalue.maxsize(位于kylin配置文件中的值为conf / kylin.properteis)和hbase.client.keyvalue.maxsize(驻留在hbase配置文件中)的值相同。通常,当kylin.hbase.client.keyvalue.maxsize的值大于hbase.client.keyvalue.maxsize时,我们会得到键值大小太大的错误
请在下面找到样本麒麟属性
# kylin server's mode
kylin.server.mode=all
# optional information for the owner of kylin platform, it can be your team's email
# currently it will be attached to each kylin's htable attribute
kylin.owner=whoami@kylin.apache.org
# List of web servers in use, this enables one web server instance to sync up with other servers.
kylin.rest.servers=localhost:7070
# The metadata store in hbase
kylin.metadata.url=kylin_metadata@hbase
# The storage for final cube file in hbase
kylin.storage.url=hbase
# Temp folder in hdfs, make sure user has the right access to the hdfs directory
kylin.hdfs.working.dir=/kylin
# HBase Cluster FileSystem, which serving hbase, format as hdfs://hbase-cluster:8020
# leave empty if hbase running on same cluster with hive and mapreduce
kylin.hbase.cluster.fs=
kylin.job.mapreduce.default.reduce.input.mb=500
# max job retry on error, default 0: no retry
kylin.job.retry=0
# If true, job engine will not assume that hadoop CLI reside on the same server as it self
# you will have to specify kylin.job.remote.cli.hostname, kylin.job.remote.cli.username and kylin.job.remote.cli.password
# It should not be set to "true" unless you're NOT running Kylin.sh on a hadoop client machine
# (Thus kylin instance has to ssh to another real hadoop client machine to execute hbase,hive,hadoop commands)
kylin.job.run.as.remote.cmd=false
# Only necessary when kylin.job.run.as.remote.cmd=true
kylin.job.remote.cli.hostname=
# Only necessary when kylin.job.run.as.remote.cmd=true
kylin.job.remote.cli.username=
# Only necessary when kylin.job.run.as.remote.cmd=true
kylin.job.remote.cli.password=
# Used by test cases to prepare synthetic data for sample cube
kylin.job.remote.cli.working.dir=/tmp/kylin
# Max count of concurrent jobs running
kylin.job.concurrent.max.limit=10
# Time interval to check hadoop job status
kylin.job.yarn.app.rest.check.interval.seconds=10
# Hive database name for putting the intermediate flat tables
kylin.job.hive.database.for.intermediatetable=default
#default compression codec for htable,snappy,lzo,gzip,lz4
kylin.hbase.default.compression.codec=snappy
#the percentage of the sampling, default 100%
kylin.job.cubing.inmem.sampling.percent=100
# The cut size for hbase region, in GB.
kylin.hbase.region.cut=5
# The hfile size of GB, smaller hfile leading to the converting hfile MR has more reducers and be faster
# set 0 to disable this optimization
kylin.hbase.hfile.size.gb=2
# Enable/disable ACL check for cube query
kylin.query.security.enabled=true
# whether get job status from resource manager with kerberos authentication
kylin.job.status.with.kerberos=false
## kylin security configurations
# spring security profile, options: testing, ldap, saml
# with "testing" profile, user can use pre-defined name/pwd like KYLIN/ADMIN to login
kylin.security.profile=testing
# default roles and admin roles in LDAP, for ldap and saml
acl.defaultRole=ROLE_ANALYST,ROLE_MODELER
acl.adminRole=ROLE_ADMIN
#LDAP authentication configuration
ldap.server=ldap://ldap_server:389
ldap.username=
ldap.password=
#LDAP user account directory;
ldap.user.searchBase=
ldap.user.searchPattern=
ldap.user.groupSearchBase=
#LDAP service account directory
ldap.service.searchBase=
ldap.service.searchPattern=
ldap.service.groupSearchBase=
#SAML configurations for SSO
# SAML IDP metadata file location
saml.metadata.file=classpath:sso_metadata.xml
saml.metadata.entityBaseURL=https://hostname/kylin
saml.context.scheme=https
saml.context.serverName=hostname
saml.context.serverPort=443
saml.context.contextPath=/kylin
ganglia.group=
ganglia.port=8664
## Config for mail service
# If true, will send email notification;
mail.enabled=false
mail.host=
mail.username=
mail.password=
mail.sender=
###########################config info for web#######################
#help info ,format{name|displayName|link} ,optional
kylin.web.help.length=4
kylin.web.help.0=start|Getting Started|
kylin.web.help.1=odbc|ODBC Driver|
kylin.web.help.2=tableau|Tableau Guide|
kylin.web.help.3=onboard|Cube Design Tutorial|
#guide user how to build streaming cube
kylin.web.streaming.guide=http://kylin.apache.org/
#hadoop url link ,optional
kylin.web.hadoop=
#job diagnostic url link ,optional
kylin.web.diagnostic=
#contact mail on web page ,optional
kylin.web.contact_mail=
###########################config info for front#######################
#env DEV|QA|PROD
deploy.env=QA
###########################deprecated configs#######################
kylin.sandbox=true
kylin.web.hive.limit=20
# The cut size for hbase region,
#in GB.
# E.g, for cube whose capacity be marked as "SMALL", split region per 5GB by default
kylin.hbase.region.cut.small=5
kylin.hbase.region.cut.medium=10
kylin.hbase.region.cut.large=50
kylin.hbase.client.keyvalue.maxsize=1048576
内部属性设置为kylin.hbase.client.keyvalue.maxsize = 1048576
答案 1 :(得分:0)
@ Nithin K Anil
在kylin.properties中找不到kylin.hbase.client.keyvalue.maxsize。 Kylin.properties看起来像这样:
> [root@sandbox conf]# cat kylin.properties
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
# kylin server's mode
kylin.server.mode=all
# optional information for the owner of kylin platform, it can be your team's email
# currently it will be attached to each kylin's htable attribute
kylin.owner=whoami@kylin.apache.org
# List of web servers in use, this enables one web server instance to sync up with other servers.
kylin.rest.servers=localhost:7070
# The metadata store in hbase
kylin.metadata.url=kylin_metadata@hbase
# The storage for final cube file in hbase
kylin.storage.url=hbase
# Temp folder in hdfs, make sure user has the right access to the hdfs directory
kylin.hdfs.working.dir=/kylin
# HBase Cluster FileSystem, which serving hbase, format as hdfs://hbase-cluster:8020
# leave empty if hbase running on same cluster with hive and mapreduce
kylin.hbase.cluster.fs=
kylin.job.mapreduce.default.reduce.input.mb=500
# max job retry on error, default 0: no retry
kylin.job.retry=0
# If true, job engine will not assume that hadoop CLI reside on the same server as it self
# you will have to specify kylin.job.remote.cli.hostname, kylin.job.remote.cli.username and kylin.job.remote.cli.password
# It should not be set to "true" unless you're NOT running Kylin.sh on a hadoop client machine
# (Thus kylin instance has to ssh to another real hadoop client machine to execute hbase,hive,hadoop commands)
kylin.job.run.as.remote.cmd=false
# Only necessary when kylin.job.run.as.remote.cmd=true
kylin.job.remote.cli.hostname=
# Only necessary when kylin.job.run.as.remote.cmd=true
kylin.job.remote.cli.username=
# Only necessary when kylin.job.run.as.remote.cmd=true
kylin.job.remote.cli.password=
# Used by test cases to prepare synthetic data for sample cube
kylin.job.remote.cli.working.dir=/tmp/kylin
# Max count of concurrent jobs running
kylin.job.concurrent.max.limit=10
# Time interval to check hadoop job status
kylin.job.yarn.app.rest.check.interval.seconds=10
# Hive database name for putting the intermediate flat tables
kylin.job.hive.database.for.intermediatetable=default
#default compression codec for htable,snappy,lzo,gzip,lz4
kylin.hbase.default.compression.codec=snappy
#the percentage of the sampling, default 100%
kylin.job.cubing.inmem.sampling.percent=100
# The cut size for hbase region, in GB.
kylin.hbase.region.cut=5
# The hfile size of GB, smaller hfile leading to the converting hfile MR has more reducers and be faster
# set 0 to disable this optimization
kylin.hbase.hfile.size.gb=2
# Enable/disable ACL check for cube query
kylin.query.security.enabled=true
# whether get job status from resource manager with kerberos authentication
kylin.job.status.with.kerberos=false
## kylin security configurations
# spring security profile, options: testing, ldap, saml
# with "testing" profile, user can use pre-defined name/pwd like KYLIN/ADMIN to login
kylin.security.profile=testing
# default roles and admin roles in LDAP, for ldap and saml
acl.defaultRole=ROLE_ANALYST,ROLE_MODELER
acl.adminRole=ROLE_ADMIN
#LDAP authentication configuration
ldap.server=ldap://ldap_server:389
ldap.username=
ldap.password=
#LDAP user account directory;
ldap.user.searchBase=
ldap.user.searchPattern=
ldap.user.groupSearchBase=
#LDAP service account directory
ldap.service.searchBase=
ldap.service.searchPattern=
ldap.service.groupSearchBase=
#SAML configurations for SSO
# SAML IDP metadata file location
saml.metadata.file=classpath:sso_metadata.xml
saml.metadata.entityBaseURL=https://hostname/kylin
saml.context.scheme=https
saml.context.serverName=hostname
saml.context.serverPort=443
saml.context.contextPath=/kylin
ganglia.group=
ganglia.port=8664
## Config for mail service
# If true, will send email notification;
mail.enabled=false
mail.host=
mail.username=
mail.password=
mail.sender=
###########################config info for web#######################
#help info ,format{name|displayName|link} ,optional
kylin.web.help.length=4
kylin.web.help.0=start|Getting Started|
kylin.web.help.1=odbc|ODBC Driver|
kylin.web.help.2=tableau|Tableau Guide|
kylin.web.help.3=onboard|Cube Design Tutorial|
#guide user how to build streaming cube
kylin.web.streaming.guide=http://kylin.apache.org/
#hadoop url link ,optional
kylin.web.hadoop=
#job diagnostic url link ,optional
kylin.web.diagnostic=
#contact mail on web page ,optional
kylin.web.contact_mail=
###########################config info for front#######################
#env DEV|QA|PROD
deploy.env=QA
###########################deprecated configs#######################
kylin.sandbox=true
kylin.web.hive.limit=20
# The cut size for hbase region,
#in GB.
# E.g, for cube whose capacity be marked as "SMALL", split region per 5GB by default
kylin.hbase.region.cut.small=5
kylin.hbase.region.cut.medium=10
kylin.hbase.region.cut.large=50
答案 2 :(得分:0)
我们已经在Splice Machine上达到了关键限制......
还要记住KeyValue规范中Key需要适合短路。 KEYVALUE#getRowOffset()