Question

当我运行R并通过SparkR 1.5连接到Spark时，我在单个节点hadoop POC环境（Ubuntu 14.04）上遇到问题。我以前跑了几次这个测试，直到今天我都没有遇到任何问题。

我的目标是使用SparkR连接到Hive并引入一个表（最终将df结果写回Hive）。这是来自RStudio的R控制台的工作。我非常难过，感谢任何建议。

library(SparkR, lib.loc="/usr/hdp/2.3.6.0-3796/spark/R/lib/")
sc <- sparkR.init(sparkHome = "/usr/hdp/2.3.6.0-3796/spark/")

Launching java with spark-submit command /usr/hdp/2.3.6.0-3796/spark//bin/spark-submit   sparkr-shell /tmp/RtmpdGojW1/backend_portb8b949c8f0e2 
17/08/15 15:50:18 WARN SparkConf: The configuration key 'spark.yarn.applicationMaster.waitTries' has been deprecated as of Spark 1.3 and and may be removed in the future. Please use the new key 'spark.yarn.am.waitTime' instead.
17/08/15 15:50:19 WARN SparkConf: The configuration key 'spark.yarn.applicationMaster.waitTries' has been deprecated as of Spark 1.3 and and may be removed in the future. Please use the new key 'spark.yarn.am.waitTime' instead.
17/08/15 15:50:19 INFO SparkContext: Running Spark version 1.5.2
17/08/15 15:50:20 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
17/08/15 15:50:20 WARN SparkConf: The configuration key 'spark.yarn.applicationMaster.waitTries' has been deprecated as of Spark 1.3 and and may be removed in the future. Please use the new key 'spark.yarn.am.waitTime' instead.
17/08/15 15:50:20 WARN Utils: Your hostname, localhost resolves to a loopback address: 127.0.0.1; using 10.100.0.11 instead (on interface eth0)
17/08/15 15:50:20 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
17/08/15 15:50:20 INFO SecurityManager: Changing view acls to: rstudio
17/08/15 15:50:20 INFO SecurityManager: Changing modify acls to: rstudio
17/08/15 15:50:20 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(rstudio); users with modify permissions: Set(rstudio)
17/08/15 15:50:22 INFO Slf4jLogger: Slf4jLogger started
17/08/15 15:50:22 INFO Remoting: Starting remoting
17/08/15 15:50:23 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriver@10.100.0.11:43827]
17/08/15 15:50:23 INFO Utils: Successfully started service 'sparkDriver' on port 43827.
17/08/15 15:50:23 INFO SparkEnv: Registering MapOutputTracker
17/08/15 15:50:23 WARN SparkConf: The configuration key 'spark.yarn.applicationMaster.waitTries' has been deprecated as of Spark 1.3 and and may be removed in the future. Please use the new key 'spark.yarn.am.waitTime' instead.
17/08/15 15:50:23 WARN SparkConf: The configuration key 'spark.yarn.applicationMaster.waitTries' has been deprecated as of Spark 1.3 and and may be removed in the future. Please use the new key 'spark.yarn.am.waitTime' instead.
17/08/15 15:50:23 INFO SparkEnv: Registering BlockManagerMaster
17/08/15 15:50:23 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-bea658dc-145f-48a6-bb28-6f05af529547
17/08/15 15:50:23 INFO MemoryStore: MemoryStore started with capacity 530.0 MB
17/08/15 15:50:23 WARN SparkConf: The configuration key 'spark.yarn.applicationMaster.waitTries' has been deprecated as of Spark 1.3 and and may be removed in the future. Please use the new key 'spark.yarn.am.waitTime' instead.
17/08/15 15:50:23 INFO HttpFileServer: HTTP File server directory is /tmp/spark-6b719b9d-3d54-48bc-8894-cd2ddf9b0755/httpd-e7371ee1-5574-476d-9d53-679a9781af2d
17/08/15 15:50:23 INFO HttpServer: Starting HTTP Server
17/08/15 15:50:23 INFO Server: jetty-8.y.z-SNAPSHOT
17/08/15 15:50:23 INFO AbstractConnector: Started SocketConnector@0.0.0.0:39275
17/08/15 15:50:23 INFO Utils: Successfully started service 'HTTP file server' on port 39275.
17/08/15 15:50:23 INFO SparkEnv: Registering OutputCommitCoordinator
17/08/15 15:50:23 INFO Server: jetty-8.y.z-SNAPSHOT
17/08/15 15:50:24 INFO AbstractConnector: Started SelectChannelConnector@0.0.0.0:4040
17/08/15 15:50:24 INFO Utils: Successfully started service 'SparkUI' on port 4040.
17/08/15 15:50:24 INFO SparkUI: Started SparkUI at http://10.100.0.11:4040
17/08/15 15:50:24 WARN SparkConf: The configuration key 'spark.yarn.applicationMaster.waitTries' has been deprecated as of Spark 1.3 and and may be removed in the future. Please use the new key 'spark.yarn.am.waitTime' instead.
17/08/15 15:50:24 WARN SparkConf: The configuration key 'spark.yarn.applicationMaster.waitTries' has been deprecated as of Spark 1.3 and and may be removed in the future. Please use the new key 'spark.yarn.am.waitTime' instead.
17/08/15 15:50:24 WARN SparkConf: The configuration key 'spark.yarn.applicationMaster.waitTries' has been deprecated as of Spark 1.3 and and may be removed in the future. Please use the new key 'spark.yarn.am.waitTime' instead.
17/08/15 15:50:24 WARN MetricsSystem: Using default name DAGScheduler for source because spark.app.id is not set.
17/08/15 15:50:24 INFO Executor: Starting executor ID driver on host localhost
17/08/15 15:50:24 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 43075.
17/08/15 15:50:24 INFO NettyBlockTransferService: Server created on 43075
17/08/15 15:50:24 INFO BlockManagerMaster: Trying to register BlockManager
17/08/15 15:50:24 INFO BlockManagerMasterEndpoint: Registering block manager localhost:43075 with 530.0 MB RAM, BlockManagerId(driver, localhost, 43075)
17/08/15 15:50:24 INFO BlockManagerMaster: Registered BlockManager

hiveContext <- sparkRHive.init(sc)

17/08/15 15:51:17 WARN SparkConf: The configuration key 'spark.yarn.applicationMaster.waitTries' has been deprecated as of Spark 1.3 and and may be removed in the future. Please use the new key 'spark.yarn.am.waitTime' instead.
17/08/15 15:51:19 INFO HiveContext: Initializing execution hive, version 1.2.1
17/08/15 15:51:19 INFO ClientWrapper: Inspected Hadoop version: 2.7.1.2.3.6.0-3796
17/08/15 15:51:19 INFO ClientWrapper: Loaded org.apache.hadoop.hive.shims.Hadoop23Shims for Hadoop version 2.7.1.2.3.6.0-3796
17/08/15 15:51:19 WARN SparkConf: The configuration key 'spark.yarn.applicationMaster.waitTries' has been deprecated as of Spark 1.3 and and may be removed in the future. Please use the new key 'spark.yarn.am.waitTime' instead.
17/08/15 15:51:20 INFO metastore: Trying to connect to metastore with URI thrift://localhost.localdomain:9083
17/08/15 15:51:20 INFO metastore: Connected to metastore.
17/08/15 15:51:21 WARN DomainSocketFactory: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded.
17/08/15 15:51:22 INFO SessionState: Created local directory: /tmp/a4f76c27-cf73-45bf-b873-a0e97ca43309_resources
17/08/15 15:51:22 INFO SessionState: Created HDFS directory: /tmp/hive/rstudio/a4f76c27-cf73-45bf-b873-a0e97ca43309
17/08/15 15:51:22 INFO SessionState: Created local directory: /tmp/rstudio/a4f76c27-cf73-45bf-b873-a0e97ca43309
17/08/15 15:51:22 INFO SessionState: Created HDFS directory: /tmp/hive/rstudio/a4f76c27-cf73-45bf-b873-a0e97ca43309/_tmp_space.db
17/08/15 15:51:22 INFO HiveContext: default warehouse location is /user/hive/warehouse
17/08/15 15:51:22 INFO HiveContext: Initializing HiveMetastoreConnection version 1.2.1 using Spark classes.
17/08/15 15:51:22 INFO ClientWrapper: Inspected Hadoop version: 2.7.1.2.3.6.0-3796
17/08/15 15:51:22 INFO ClientWrapper: Loaded org.apache.hadoop.hive.shims.Hadoop23Shims for Hadoop version 2.7.1.2.3.6.0-3796
17/08/15 15:51:22 WARN SparkConf: The configuration key 'spark.yarn.applicationMaster.waitTries' has been deprecated as of Spark 1.3 and and may be removed in the future. Please use the new key 'spark.yarn.am.waitTime' instead.
17/08/15 15:51:25 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
17/08/15 15:51:25 INFO metastore: Trying to connect to metastore with URI thrift://localhost.localdomain:9083
17/08/15 15:51:25 INFO metastore: Connected to metastore.
17/08/15 15:51:27 WARN DomainSocketFactory: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded.
17/08/15 15:51:27 INFO SessionState: Created local directory: /tmp/16b5f51f-f570-4fc0-b3a6-eda3edd19b59_resources
17/08/15 15:51:27 INFO SessionState: Created HDFS directory: /tmp/hive/rstudio/16b5f51f-f570-4fc0-b3a6-eda3edd19b59
17/08/15 15:51:27 INFO SessionState: Created local directory: /tmp/rstudio/16b5f51f-f570-4fc0-b3a6-eda3edd19b59
17/08/15 15:51:27 INFO SessionState: Created HDFS directory: /tmp/hive/rstudio/16b5f51f-f570-4fc0-b3a6-eda3edd19b59/_tmp_space.db

showDF(sql(hiveContext, "USE MyHiveDB"))

Error: is.character(x) is not TRUE

showDF(sql(hiveContext, "SELECT *  FROM table"))

Error: is.character(x) is not TRUE

Answer 1

解决。这里的问题恰恰是cricket_007 suggested with the databrick link。 R Session中使用了一些与SparkR实例冲突的包。

通过将它们与当前的R Session分离，这解决了问题，并使代码工作。

要分离的包是：

plyr
dplyr
dbplyr

SparkR 1.5.2连接到HIVE =停止工作并生成错误

1 个答案: