在Mac OS上将Spark与R studio连接会产生Hive错误

时间:2018-01-22 17:17:24

标签: r sparklyr

我正在尝试使用MacOS上的sparklyr库在R Studio中使用Spark。我使用以下命令安装了它

# Install the sparklyr package
install.packages("sparklyr")

# Now load the library
library(sparklyr)

# Install Spark to your local machine
spark_install(version = "2.1.0")

install.packages("devtools")

# Install latest version of sparklyr
devtools::install_github("rstudio/sparklyr")

# Connect to Spark
options(sparklyr.java9 = TRUE)

sc = spark_connect(master = "local")

iris_tbl <- copy_to(sc, iris) # Throws hive error !!!

以下是我所面临的错误 - &gt;

  

iris_tbl&lt; - copy_to(sc,iris)   错误:java.lang.IllegalArgumentException:实例化&#39; org.apache.spark.sql.hive.HiveSessionState&#39;时出错:       在org.apache.spark.sql.SparkSession $ .org $ apache $ spark $ sql $ SparkSession $$ reflect(SparkSession.scala:981)       在org.apache.spark.sql.SparkSession.sessionState $ lzycompute(SparkSession.scala:110)       在org.apache.spark.sql.SparkSession.sessionState(SparkSession.scala:109)       在org.apache.spark.sql.SparkSession $ Builder $$ anonfun $ getOrCreate $ 5.apply(SparkSession.scala:878)       在org.apache.spark.sql.SparkSession $ Builder $$ anonfun $ getOrCreate $ 5.apply(SparkSession.scala:878)       在scala.collection.mutable.HashMap $$ anonfun $ foreach $ 1.apply(HashMap.scala:99)       在scala.collection.mutable.HashMap $$ anonfun $ foreach $ 1.apply(HashMap.scala:99)       在scala.collection.mutable.HashTable $ class.foreachEntry(HashTable.scala:230)       在scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40)       在scala.collection.mutable.HashMap.foreach(HashMap.scala:99)       在org.apache.spark.sql.SparkSession $ Builder.getOrCreate(SparkSession.scala:878)       at java.base / jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)       at java.base / jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)       at java.base / jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)       在java.base / java.lang.reflect.Method.invoke(Method.java:564)       在sparklyr.Invoke $ .invoke(invoke.scala:102)       在sparklyr.StreamHandler $ .handleMethodCall(stream.scala:97)       在sparklyr.StreamHandler $ .read(stream.scala:62)       在sparklyr.BackendHandler.channelRead0(handler.scala:52)       在sparklyr.BackendHandler.channelRead0(handler.scala:14)       在io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)       at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:367)       at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:353)       at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:346)       at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:102)       at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:367)       at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:353)       at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:346)       at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:293)       at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:267)       at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:367)       at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:353)       at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:346)       at io.netty.channel.DefaultChannelPipeline $ HeadContext.channelRead(DefaultChannelPipeline.java:1294)       at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:367)       at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:353)       在io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:911)       at io.netty.channel.nio.AbstractNioByteChannel $ NioByteUnsafe.read(AbstractNioByteChannel.java:131)       在io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:652)       at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:575)       在io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:489)       在io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:451)       at io.netty.util.concurrent.SingleThreadEventExecutor $ 2.run(SingleThreadEventExecutor.java:140)       at io.netty.util.concurrent.DefaultThreadFactory $ DefaultRunnableDecorator.run(DefaultThreadFactory.java:144)       在java.base / java.lang.Thread.run(Thread.java:844)   引起:java.lang.reflect.InvocationTargetException       at java.base / jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)       at java.base / jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)       at java.base / jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)       在java.base / java.lang.reflect.Constructor.newInstance(Constructor.java:488)       在org.apache.spark.sql.SparkSession $ .org $ apache $ spark $ sql $ SparkSession $$ reflect(SparkSession.scala:978)       ......还有44个   引起:java.lang.IllegalArgumentException:实例化&#39; org.apache.spark.sql.hive.HiveExternalCatalog&#39;时出错:       at org.apache.spark.sql.internal.SharedState $ .org $ apache $ spark $ sql $ internal $ SharedState $$ reflect(SharedState.scala:169)       在org.apache.spark.sql.internal.SharedState。(SharedState.scala:86)       在org.apache.spark.sql.SparkSession $$ anonfun $ sharedState $ 1.apply(SparkSession.scala:101)       在org.apache.spark.sql.SparkSession $$ anonfun $ sharedState $ 1.apply(SparkSession.scala:101)       在scala.Option.getOrElse(Option.scala:121)       在org.apache.spark.sql.SparkSession.sharedState $ lzycompute(SparkSession.scala:101)       在org.apache.spark.sql.SparkSession.sharedState(SparkSession.scala:100)       在org.apache.spark.sql.internal.SessionState。(SessionState.scala:157)       在org.apache.spark.sql.hive.HiveSessionState。(HiveSessionState.scala:32)       ......还有49个   引起:java.lang.reflect.InvocationTargetException       at java.base / jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)       at java.base / jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)       at java.base / jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)       在java.base / java.lang.reflect.Constructor.newInstance(Constructor.java:488)       在org.apache.spark.sql.internal.SharedState $ .org $ apache $ spark $ sql $ internal $ SharedState $$ reflect(SharedState.scala:166)       ......还有57个   引起:java.lang.ClassNotFoundException:java.lang.NoClassDefFoundError:org / apache / hadoop / hive / conf / HiveConf使用classpath创建Hive客户端时:file:/Library/Frameworks/R.framework/Versions/3.4/Resources/库/ sparklyr / JAVA / sparklyr-2.1-2.11.jar   请确保您的hive和hadoop版本的jar包含在传递给spark.sql.hive.metastore.jars的路径中。       在org.apache.spark.sql.hive.client.IsolatedClientLoader.createClient(IsolatedClientLoader.scala:270)       在org.apache.spark.sql.hive.HiveUtils $ .newClientForMetadata(HiveUtils.scala:366)       在org.apache.spark.sql.hive.HiveUtils $ .newClientForMetadata(HiveUtils.scala:270)       在org.apache.spark.sql.hive.HiveExternalCatalog。(HiveExternalCatalog.scala:65)       ......还有62个   引起:java.lang.reflect.InvocationTargetException       at java.base / jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)       at java.base / jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)       at java.base / jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)       在java.base / java.lang.reflect.Constructor.newInstance(Constructor.java:488)       在org.apache.spark.sql.hive.client.IsolatedClientLoader.createClient(IsolatedClientLoader.scala:264)       ......还有65个   引起:java.lang.NoClassDefFoundError:org / apache / hadoop / hive / conf / HiveConf       在org.apache.spark.sql.hive.client.HiveClientImpl。(HiveClientImpl.scala:97)       ......还有70多个   引起:java.lang.ClassNotFoundException:org.apache.hadoop.hive.conf.HiveConf       在java.base / java.net.URLClassLoader.findClass(URLClassLoader.java:466)       at java.base / java.lang.ClassLoader.loadClass(ClassLoader.java:563)       在org.apache.spark.sql.hive.client.IsolatedClientLoader $$ anon $ 1.doLoadClass(IsolatedC

1 个答案:

答案 0 :(得分:0)

新的Mac OSX版本记录了在R / RStudio中弄乱Java路径的问题(参见here)。我有一种感觉(尽管我并非100%肯定),这是你在这里遇到的。

如果您查看我上面提到的问题,希望您能找到一个可以重置路径的解决方案;我发现在High Sierra最适合我的那个是:

dyn.load('/Library/Java/JavaVirtualMachines/jdk1.8.0_66.jdk/Contents/Home/jre/lib/server/libjvm.dylib')