Apache Spark(v1.6.1)使用./start-all.sh
在Ubuntu(10.10.0.102)计算机上作为服务启动。
现在需要使用Java API远程向该服务器提交作业。
以下是从不同机器(10.10.0.95)运行的Java客户端代码。
String mySqlConnectionUrl = "jdbc:mysql://localhost:3306/demo?user=sec&password=sec";
String jars[] = new String[] {"/home/.m2/repository/com/databricks/spark-csv_2.10/1.4.0/spark-csv_2.10-1.4.0.jar",
"/home/.m2/repository/org/apache/commons/commons-csv/1.1/commons-csv-1.1.jar",
"/home/.m2/repository/mysql/mysql-connector-java/6.0.2/mysql-connector-java-6.0.2.jar"};
SparkConf sparkConf = new SparkConf()
.setAppName("sparkCSVWriter")
.setMaster("spark://10.10.0.102:7077")
.setJars(jars);
JavaSparkContext javaSparkContext = new JavaSparkContext(sparkConf);
SQLContext sqlContext = new SQLContext(javaSparkContext);
Map<String, String> options = new HashMap<>();
options.put("driver", "com.mysql.jdbc.Driver");
options.put("url", mySqlConnectionUrl);
options.put("dbtable", "(select p.FIRST_NAME from person p) as firstName");
DataFrame dataFrame = sqlContext.read().format("jdbc").options(options).load();
dataFrame.write()
.format("com.databricks.spark.csv")
.option("header", "true")
.option("delimiter", "|")
.option("quote", "\"")
.option("quoteMode", QuoteMode.NON_NUMERIC.toString())
.option("escape", "\\")
.save("persons.csv");
Configuration hadoopConfiguration = javaSparkContext.hadoopConfiguration();
FileSystem hdfs = FileSystem.get(hadoopConfiguration);
FileUtil.copyMerge(hdfs, new Path("persons.csv"), hdfs, new Path("\home\persons1.csv"), true, hadoopConfiguration, new String());
根据代码需要使用Spark将RDBMS数据转换为csv / json。但是,当我运行此客户端应用程序时,能够连接到远程spark服务器,但在控制台中不断收到以下WARN消息
WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
在运行应用程序的Spark UI上的服务器端&gt;执行人摘要&gt; stderr日志,收到以下错误。
Exception in thread "main" java.io.IOException: Failed to connect to /192.168.56.1:53112
at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:216)
at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:167)
at org.apache.spark.rpc.netty.NettyRpcEnv.createClient(NettyRpcEnv.scala:200)
at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:187)
at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:183)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.net.ConnectException: Connection refused: /192.168.56.1:53112
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:224)
at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:289)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:528)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
... 1 more
但是没有任何IP地址配置为192.168.56.1
。那么有没有任何配置缺失。
答案 0 :(得分:-1)
实际上我的客户机(10.10.0.95)是Windows机器。当我尝试使用另一台Ubuntu机器(10.10.0.155)提交Spark作业时,我能够成功运行相同的Java客户端代码。
当我在Windows客户端环境中调试时,当我在显示日志后提交spark作业时,
INFO Remoting: Starting remoting
INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriverActorSystem@192.168.56.1:61552]
INFO Utils: Successfully started service 'sparkDriverActorSystem' on port 61552.
INFO MemoryStore: MemoryStore started with capacity 2.4 GB
INFO SparkEnv: Registering OutputCommitCoordinator
INFO Utils: Successfully started service 'SparkUI' on port 4044.
INFO SparkUI: Started SparkUI at http://192.168.56.1:4044
根据日志行号2,其注册客户端为192.168.56.1。
在其他地方,在Ubuntu客户端
INFO Remoting: Starting remoting
INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriverActorSystem@10.10.0.155:42786]
INFO Utils: Successfully started service 'sparkDriverActorSystem' on port 42786.
INFO MemoryStore: MemoryStore started with capacity 511.1 MB
INFO SparkEnv: Registering OutputCommitCoordinator
INFO Utils: Successfully started service 'SparkUI' on port 4040.
INFO SparkUI: Started SparkUI at http://10.10.0.155:4040
根据日志行号2,其寄存器客户端10.10.0.155与实际IP地址相同。
如果有人发现Windows客户端有什么问题,请让社区知道。
<强> [UPDATE] 强>
我在Virtual Box中运行整个环境。 Windows机器是我的主机,Ubuntu是客户机。和Spark安装在Ubuntu机器上。在虚拟框环境中使用IPv4地址安装Ethernet adapter VirtualBox Host-Only Netwotk
的虚拟框:192.168.56.1
。并且Spark将此IP注册为客户端IP而不是实际IP地址10.10.0.95
。