我正在使用“sparklyr”包来处理Spark with R.
在sparklyr中加载文件时出现以下错误:
我的代码是:
library(sparklyr)
sc <- spark_connect(master = "local", version = "2.0.1")
iris_tbl <- copy_to(sc, iris)
错误显示:
Error: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source)
at java.lang.reflect.Constructor.newInstance(Unknown Source)
at org.apache.spark.sql.hive.client.IsolatedClientLoader.createClient(IsolatedClientLoader.scala:258)
at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:359)
at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:263)
at org.apache.spark.sql.hive.HiveSharedState.metadataHive$lzycompute(HiveSharedState.scala:39)
at org.apache.spark.sql.hive.HiveSharedState.metadataHive(HiveSharedState.scala:38)
at org.apache.spark.sql.hive.HiveSharedState.externalCatalog$lzycompute(HiveSharedState.scala:46)
我试图找到/研究解决方案,但没有在任何其他网站上找到确认解决方案。
答案 0 :(得分:0)
我在边缘节点的Cloudera Cluster中遇到了同样的错误。这似乎是某些容量问题,我还不清楚。 以下mods使您的代码工作
config <- spark_config()
config$spark.yarn.keytab <- "<user.keytab>"
config$spark.yarn.principal <- "<user@host>"
config$spark.executor.cores <- 4
config$spark.executor.memory <- "20g"
config$spark.driver.memory <- "40g"
config$spark.yarn.driver.memoryOverhead <- "8g"
config$spark.yarn.executors.memoryOverhead <- "8g"
config$spark.kryoserializer.buffer.max <- "256m"
config$spark.dynamicAllocation.enabled <- "false"
config$spark.executor.instances <- 24
config$sparklyr.cores.local <- 4
sc <- spark_connect(master = "yarn-client", version = "1.6.0", config = config)
iris_tbl <- copy_to(sc, iris)
希望这会有所帮助.... 但是,如果我运行
,我仍然遇到大型Hive表无法加载的问题dbGetQuery(sc, 'select * from large_table limit 10')