我正在尝试使用MacOS上的sparklyr
库在R Studio中使用Spark。我使用以下命令安装了它
# Install the sparklyr package
install.packages("sparklyr")
# Now load the library
library(sparklyr)
# Install Spark to your local machine
spark_install(version = "2.1.0")
install.packages("devtools")
# Install latest version of sparklyr
devtools::install_github("rstudio/sparklyr")
# Connect to Spark
options(sparklyr.java9 = TRUE)
sc = spark_connect(master = "local")
iris_tbl <- copy_to(sc, iris) # Throws hive error !!!
以下是我所面临的错误 - &gt;
iris_tbl&lt; - copy_to(sc,iris) 错误:java.lang.IllegalArgumentException:实例化&#39; org.apache.spark.sql.hive.HiveSessionState&#39;时出错: 在org.apache.spark.sql.SparkSession $ .org $ apache $ spark $ sql $ SparkSession $$ reflect(SparkSession.scala:981) 在org.apache.spark.sql.SparkSession.sessionState $ lzycompute(SparkSession.scala:110) 在org.apache.spark.sql.SparkSession.sessionState(SparkSession.scala:109) 在org.apache.spark.sql.SparkSession $ Builder $$ anonfun $ getOrCreate $ 5.apply(SparkSession.scala:878) 在org.apache.spark.sql.SparkSession $ Builder $$ anonfun $ getOrCreate $ 5.apply(SparkSession.scala:878) 在scala.collection.mutable.HashMap $$ anonfun $ foreach $ 1.apply(HashMap.scala:99) 在scala.collection.mutable.HashMap $$ anonfun $ foreach $ 1.apply(HashMap.scala:99) 在scala.collection.mutable.HashTable $ class.foreachEntry(HashTable.scala:230) 在scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40) 在scala.collection.mutable.HashMap.foreach(HashMap.scala:99) 在org.apache.spark.sql.SparkSession $ Builder.getOrCreate(SparkSession.scala:878) at java.base / jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base / jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at java.base / jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 在java.base / java.lang.reflect.Method.invoke(Method.java:564) 在sparklyr.Invoke $ .invoke(invoke.scala:102) 在sparklyr.StreamHandler $ .handleMethodCall(stream.scala:97) 在sparklyr.StreamHandler $ .read(stream.scala:62) 在sparklyr.BackendHandler.channelRead0(handler.scala:52) 在sparklyr.BackendHandler.channelRead0(handler.scala:14) 在io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:367) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:353) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:346) at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:102) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:367) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:353) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:346) at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:293) at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:267) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:367) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:353) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:346) at io.netty.channel.DefaultChannelPipeline $ HeadContext.channelRead(DefaultChannelPipeline.java:1294) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:367) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:353) 在io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:911) at io.netty.channel.nio.AbstractNioByteChannel $ NioByteUnsafe.read(AbstractNioByteChannel.java:131) 在io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:652) at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:575) 在io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:489) 在io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:451) at io.netty.util.concurrent.SingleThreadEventExecutor $ 2.run(SingleThreadEventExecutor.java:140) at io.netty.util.concurrent.DefaultThreadFactory $ DefaultRunnableDecorator.run(DefaultThreadFactory.java:144) 在java.base / java.lang.Thread.run(Thread.java:844) 引起:java.lang.reflect.InvocationTargetException at java.base / jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at java.base / jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at java.base / jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) 在java.base / java.lang.reflect.Constructor.newInstance(Constructor.java:488) 在org.apache.spark.sql.SparkSession $ .org $ apache $ spark $ sql $ SparkSession $$ reflect(SparkSession.scala:978) ......还有44个 引起:java.lang.IllegalArgumentException:实例化&#39; org.apache.spark.sql.hive.HiveExternalCatalog&#39;时出错: at org.apache.spark.sql.internal.SharedState $ .org $ apache $ spark $ sql $ internal $ SharedState $$ reflect(SharedState.scala:169) 在org.apache.spark.sql.internal.SharedState。(SharedState.scala:86) 在org.apache.spark.sql.SparkSession $$ anonfun $ sharedState $ 1.apply(SparkSession.scala:101) 在org.apache.spark.sql.SparkSession $$ anonfun $ sharedState $ 1.apply(SparkSession.scala:101) 在scala.Option.getOrElse(Option.scala:121) 在org.apache.spark.sql.SparkSession.sharedState $ lzycompute(SparkSession.scala:101) 在org.apache.spark.sql.SparkSession.sharedState(SparkSession.scala:100) 在org.apache.spark.sql.internal.SessionState。(SessionState.scala:157) 在org.apache.spark.sql.hive.HiveSessionState。(HiveSessionState.scala:32) ......还有49个 引起:java.lang.reflect.InvocationTargetException at java.base / jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at java.base / jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at java.base / jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) 在java.base / java.lang.reflect.Constructor.newInstance(Constructor.java:488) 在org.apache.spark.sql.internal.SharedState $ .org $ apache $ spark $ sql $ internal $ SharedState $$ reflect(SharedState.scala:166) ......还有57个 引起:java.lang.ClassNotFoundException:java.lang.NoClassDefFoundError:org / apache / hadoop / hive / conf / HiveConf使用classpath创建Hive客户端时:file:/Library/Frameworks/R.framework/Versions/3.4/Resources/库/ sparklyr / JAVA / sparklyr-2.1-2.11.jar 请确保您的hive和hadoop版本的jar包含在传递给spark.sql.hive.metastore.jars的路径中。 在org.apache.spark.sql.hive.client.IsolatedClientLoader.createClient(IsolatedClientLoader.scala:270) 在org.apache.spark.sql.hive.HiveUtils $ .newClientForMetadata(HiveUtils.scala:366) 在org.apache.spark.sql.hive.HiveUtils $ .newClientForMetadata(HiveUtils.scala:270) 在org.apache.spark.sql.hive.HiveExternalCatalog。(HiveExternalCatalog.scala:65) ......还有62个 引起:java.lang.reflect.InvocationTargetException at java.base / jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at java.base / jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at java.base / jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) 在java.base / java.lang.reflect.Constructor.newInstance(Constructor.java:488) 在org.apache.spark.sql.hive.client.IsolatedClientLoader.createClient(IsolatedClientLoader.scala:264) ......还有65个 引起:java.lang.NoClassDefFoundError:org / apache / hadoop / hive / conf / HiveConf 在org.apache.spark.sql.hive.client.HiveClientImpl。(HiveClientImpl.scala:97) ......还有70多个 引起:java.lang.ClassNotFoundException:org.apache.hadoop.hive.conf.HiveConf 在java.base / java.net.URLClassLoader.findClass(URLClassLoader.java:466) at java.base / java.lang.ClassLoader.loadClass(ClassLoader.java:563) 在org.apache.spark.sql.hive.client.IsolatedClientLoader $$ anon $ 1.doLoadClass(IsolatedC
答案 0 :(得分:0)
新的Mac OSX版本记录了在R / RStudio中弄乱Java路径的问题(参见here)。我有一种感觉(尽管我并非100%肯定),这是你在这里遇到的。
如果您查看我上面提到的问题,希望您能找到一个可以重置路径的解决方案;我发现在High Sierra最适合我的那个是:
dyn.load('/Library/Java/JavaVirtualMachines/jdk1.8.0_66.jdk/Contents/Home/jre/lib/server/libjvm.dylib')