MongoDB Spark Connector:mongo-spark找不到集合

时间:2017-04-10 13:13:17

标签: mongodb apache-spark

我在尝试从集合中读取数据时遇到错误。

我的MongoDB实例托管在192.168.1.2,而我的spark实例托管在1.1中。代码是:

package org.sparkexample;

import org.apache.spark.api.java.JavaSparkContext;
import org.apache.spark.sql.SparkSession;
import org.bson.Document;

import com.mongodb.spark.MongoSpark;
import com.mongodb.spark.rdd.api.java.JavaMongoRDD;

public class WordCountTask {
  public static void main(String[] args) {
        System.out.println("arg : " + args[0]);
        //checkArgument(args.length > 1, "Please provide the path of input file as first parameter.");
        new WordCountTask().run(args[0]);
  }

  public void run(String inputFilePath) {

        SparkSession spark = SparkSession.builder()
            .master("spark://192.168.1.1:7077")
            .appName("MongoSparkConnectorIntro")
            .config("spark.mongodb.input.uri", "mongodb://192.168.1.2/local.Test")
            .config("spark.mongodb.output.uri", "mongodb://192.168.1.2/local.Test")
            .getOrCreate();

        JavaSparkContext jsc = new JavaSparkContext(spark.sparkContext());
        JavaMongoRDD<Document> rdd = MongoSpark.load(jsc);

        System.out.println("******************************************");
        System.out.println("The count is : ");
        System.out.println(rdd.count());
        System.out.println(rdd.first().toJson());
        System.out.println("******************************************");

        jsc.close();
  }
}

获得的错误(或更确切地说是信息)是:

INFO MongoSamplePartitioner: Could not find collection (Test),
 using a single partition

由于上述原因,.first()命令出错。但是,该集合确实存在,我可以访问它。任何人都可以让我知道什么是错的吗?

完整的日志是:

; ui acls disabled; users  with view permissions: Set(mklrjv); groups with view
 permissions: Set(); users  with modify permissions: Set(mklrjv); groups with m
odify permissions: Set()
17/04/10 18:17:09 INFO Utils: Successfully started service 'sparkDriver' on port
 34048.
17/04/10 18:17:09 INFO SparkEnv: Registering MapOutputTracker
17/04/10 18:17:09 INFO SparkEnv: Registering BlockManagerMaster
17/04/10 18:17:09 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storag
e.DefaultTopologyMapper for getting topology information
17/04/10 18:17:09 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up

17/04/10 18:17:09 INFO DiskBlockManager: Created local directory at C:\Users\mra
jeev\AppData\Local\Temp\blockmgr-17cba028-2757-4f48-88ea-f8c7b33ccba9
17/04/10 18:17:09 INFO MemoryStore: MemoryStore started with capacity 366.3 MB
17/04/10 18:17:09 INFO SparkEnv: Registering OutputCommitCoordinator
17/04/10 18:17:09 INFO Utils: Successfully started service 'SparkUI' on port 404
0.
17/04/10 18:17:09 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://
192.168.1.1:4040
17/04/10 18:17:09 INFO SparkContext: Added JAR file:/C:/Projects/SparkJava/targe
t/uber-first-example-1.0-SNAPSHOT.jar at spark://192.168.1.1:34048/jars/uber-f
irst-example-1.0-SNAPSHOT.jar with timestamp 1491828429769
17/04/10 18:17:09 INFO StandaloneAppClient$ClientEndpoint: Connecting to master
spark://192.168.1.1:7077...
17/04/10 18:17:10 INFO TransportClientFactory: Successfully created connection t
o /192.168.1.1:7077 after 55 ms (0 ms spent in bootstraps)
17/04/10 18:17:10 INFO StandaloneSchedulerBackend: Connected to Spark cluster wi
th app ID app-20170410181710-0013
17/04/10 18:17:10 INFO StandaloneAppClient$ClientEndpoint: Executor added: app-2
0170410181710-0013/0 on worker-20170410150028-192.168.1.1-33151 (192.168.1.1
:33151) with 4 cores
17/04/10 18:17:10 INFO StandaloneSchedulerBackend: Granted executor ID app-20170
410181710-0013/0 on hostPort 192.168.1.1:33151 with 4 cores, 1024.0 MB RAM
17/04/10 18:17:10 INFO Utils: Successfully started service 'org.apache.spark.net
work.netty.NettyBlockTransferService' on port 34070.
17/04/10 18:17:10 INFO StandaloneAppClient$ClientEndpoint: Executor updated: app
-20170410181710-0013/0 is now RUNNING
17/04/10 18:17:10 INFO NettyBlockTransferService: Server created on 10.78.130.13
4:34070
17/04/10 18:17:10 INFO BlockManager: Using org.apache.spark.storage.RandomBlockR
eplicationPolicy for block replication policy
17/04/10 18:17:10 INFO BlockManagerMaster: Registering BlockManager BlockManager
Id(driver, 192.168.1.1, 34070, None)
17/04/10 18:17:10 INFO BlockManagerMasterEndpoint: Registering block manager 10.
78.130.134:34070 with 366.3 MB RAM, BlockManagerId(driver, 192.168.1.1, 34070,
 None)
17/04/10 18:17:10 INFO BlockManagerMaster: Registered BlockManager BlockManagerI
d(driver, 192.168.1.1, 34070, None)
17/04/10 18:17:10 INFO BlockManager: Initialized BlockManager: BlockManagerId(dr
iver, 192.168.1.1, 34070, None)
17/04/10 18:17:11 INFO EventLoggingListener: Logging events to file:/C:/tmp/spar
k-events/app-20170410181710-0013
17/04/10 18:17:11 INFO StandaloneSchedulerBackend: SchedulerBackend is ready for
 scheduling beginning after reached minRegisteredResourcesRatio: 0.0
17/04/10 18:17:11 INFO SharedState: Warehouse path is 'file:/C:/Projects/SparkJa
va/spark-warehouse/'.
17/04/10 18:17:12 WARN SparkSession$Builder: Using an existing SparkSession; som
e configuration may not take effect.
17/04/10 18:17:12 INFO MemoryStore: Block broadcast_0 stored as values in memory
 (estimated size 216.0 B, free 366.3 MB)
17/04/10 18:17:12 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in
memory (estimated size 402.0 B, free 366.3 MB)
17/04/10 18:17:12 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 1
0.78.130.134:34070 (size: 402.0 B, free: 366.3 MB)
17/04/10 18:17:12 INFO SparkContext: Created broadcast 0 from broadcast at Mongo
Spark.scala:499
******************************************
The count is :
17/04/10 18:17:13 INFO cluster: Cluster created with settings {hosts=[10.78.130.
149:27017], mode=SINGLE, requiredClusterType=UNKNOWN, serverSelectionTimeout='30
000 ms', maxWaitQueueSize=500}
17/04/10 18:17:13 INFO cluster: Cluster description not yet available. Waiting f
or 30000 ms before timing out
17/04/10 18:17:13 INFO connection: Opened connection [connectionId{localValue:1,
 serverValue:107}] to 192.168.1.2:27017
17/04/10 18:17:13 INFO cluster: Monitor thread successfully connected to server
with description ServerDescription{address=192.168.1.2:27017, type=STANDALONE,
 state=CONNECTED, ok=true, version=ServerVersion{versionList=[3, 4, 2]}, minWire
Version=0, maxWireVersion=5, maxDocumentSize=16777216, roundTripTimeNanos=117621
8}
17/04/10 18:17:13 INFO MongoClientCache: Creating MongoClient: [192.168.1.2:27
017]
17/04/10 18:17:13 INFO connection: Opened connection [connectionId{localValue:2,
 serverValue:108}] to 192.168.1.2:27017
17/04/10 18:17:13 INFO MongoSamplePartitioner: Could not find collection (Test),
 using a single partition
17/04/10 18:17:13 INFO SparkContext: Starting job: count at WordCountTask.java:3
1
17/04/10 18:17:13 INFO DAGScheduler: Got job 0 (count at WordCountTask.java:31)
with 1 output partitions
17/04/10 18:17:13 INFO DAGScheduler: Final stage: ResultStage 0 (count at WordCo
untTask.java:31)
17/04/10 18:17:13 INFO DAGScheduler: Parents of final stage: List()
17/04/10 18:17:13 INFO DAGScheduler: Missing parents: List()
17/04/10 18:17:13 INFO DAGScheduler: Submitting ResultStage 0 (MongoRDD[0] at RD
D at MongoRDD.scala:52), which has no missing parents
17/04/10 18:17:13 INFO MemoryStore: Block broadcast_1 stored as values in memory
 (estimated size 3.0 KB, free 366.3 MB)
17/04/10 18:17:13 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in
memory (estimated size 1855.0 B, free 366.3 MB)
17/04/10 18:17:13 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on 1
0.78.130.134:34070 (size: 1855.0 B, free: 366.3 MB)
17/04/10 18:17:13 INFO SparkContext: Created broadcast 1 from broadcast at DAGSc
heduler.scala:996
17/04/10 18:17:13 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage
 0 (MongoRDD[0] at RDD at MongoRDD.scala:52)
17/04/10 18:17:13 INFO TaskSchedulerImpl: Adding task set 0.0 with 1 tasks
17/04/10 18:17:15 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Registered
executor NettyRpcEndpointRef(null) (192.168.1.1:34090) with ID 0
17/04/10 18:17:15 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, 10
.78.130.134, executor 0, partition 0, ANY, 6112 bytes)
17/04/10 18:17:15 INFO BlockManagerMasterEndpoint: Registering block manager 10.
78.130.134:34108 with 366.3 MB RAM, BlockManagerId(0, 192.168.1.1, 34108, None
)
17/04/10 18:17:18 INFO MongoClientCache: Closing MongoClient: [192.168.1.2:270
17]
17/04/10 18:17:18 INFO connection: Closed connection [connectionId{localValue:2,
 serverValue:108}] to 192.168.1.2:27017 because the pool has been closed.
17/04/10 18:17:47 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on 1
0.78.130.134:34108 (size: 1855.0 B, free: 366.3 MB)
17/04/10 18:17:48 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 1
0.78.130.134:34108 (size: 402.0 B, free: 366.3 MB)
17/04/10 18:17:49 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in
 34054 ms on 192.168.1.1 (executor 0) (1/1)
17/04/10 18:17:49 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have
all completed, from pool
17/04/10 18:17:49 INFO DAGScheduler: ResultStage 0 (count at WordCountTask.java:
31) finished in 35.409 s
17/04/10 18:17:49 INFO DAGScheduler: Job 0 finished: count at WordCountTask.java
:31, took 35.653876 s
0
17/04/10 18:17:49 INFO SparkContext: Starting job: first at WordCountTask.java:3
2
17/04/10 18:17:49 INFO DAGScheduler: Got job 1 (first at WordCountTask.java:32)
with 1 output partitions
17/04/10 18:17:49 INFO DAGScheduler: Final stage: ResultStage 1 (first at WordCo
untTask.java:32)
17/04/10 18:17:49 INFO DAGScheduler: Parents of final stage: List()
17/04/10 18:17:49 INFO DAGScheduler: Missing parents: List()
17/04/10 18:17:49 INFO DAGScheduler: Submitting ResultStage 1 (MongoRDD[0] at RD
D at MongoRDD.scala:52), which has no missing parents
17/04/10 18:17:49 INFO MemoryStore: Block broadcast_2 stored as values in memory
 (estimated size 3.2 KB, free 366.3 MB)
17/04/10 18:17:49 INFO MemoryStore: Block broadcast_2_piece0 stored as bytes in
memory (estimated size 1926.0 B, free 366.3 MB)
17/04/10 18:17:49 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on 1
0.78.130.134:34070 (size: 1926.0 B, free: 366.3 MB)
17/04/10 18:17:49 INFO SparkContext: Created broadcast 2 from broadcast at DAGSc
heduler.scala:996
17/04/10 18:17:49 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage
 1 (MongoRDD[0] at RDD at MongoRDD.scala:52)
17/04/10 18:17:49 INFO TaskSchedulerImpl: Adding task set 1.0 with 1 tasks
17/04/10 18:17:49 INFO TaskSetManager: Starting task 0.0 in stage 1.0 (TID 1, 10
.78.130.134, executor 0, partition 0, ANY, 6194 bytes)
17/04/10 18:17:49 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on 1
0.78.130.134:34108 (size: 1926.0 B, free: 366.3 MB)
17/04/10 18:17:49 INFO TaskSetManager: Finished task 0.0 in stage 1.0 (TID 1) in
 56 ms on 192.168.1.1 (executor 0) (1/1)
17/04/10 18:17:49 INFO DAGScheduler: ResultStage 1 (first at WordCountTask.java:
32) finished in 0.057 s
17/04/10 18:17:49 INFO TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks have
all completed, from pool
17/04/10 18:17:49 INFO DAGScheduler: Job 1 finished: first at WordCountTask.java
:32, took 0.076634 s
Exception in thread "main" java.lang.UnsupportedOperationException: empty collec
tion
        at org.apache.spark.rdd.RDD$$anonfun$first$1.apply(RDD.scala:1369)
        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.s
cala:151)
        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.s
cala:112)
        at org.apache.spark.rdd.RDD.withScope(RDD.scala:362)
        at org.apache.spark.rdd.RDD.first(RDD.scala:1366)
        at org.apache.spark.api.java.JavaRDDLike$class.first(JavaRDDLike.scala:5
38)
        at org.apache.spark.api.java.AbstractJavaRDDLike.first(JavaRDDLike.scala
:45)
        at org.sparkexample.WordCountTask.run(WordCountTask.java:32)
        at org.sparkexample.WordCountTask.main(WordCountTask.java:14)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.
java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces
sorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSub
mit$$runMain(SparkSubmit.scala:738)
        at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:18
7)
        at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
17/04/10 18:17:49 INFO SparkContext: Invoking stop() from shutdown hook
17/04/10 18:17:49 INFO SparkUI: Stopped Spark web UI at http://192.168.1.1:404
0
17/04/10 18:17:49 INFO StandaloneSchedulerBackend: Shutting down all executors
17/04/10 18:17:49 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Asking each
 executor to shut down
17/04/10 18:17:49 WARN TransportChannelHandler: Exception in connection from /10
.78.130.134:34132
java.io.IOException: An existing connection was forcibly closed by the remote ho
st
        at sun.nio.ch.SocketDispatcher.read0(Native Method)
        at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:43)
        at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
        at sun.nio.ch.IOUtil.read(IOUtil.java:192)
        at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
        at io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirect
ByteBuf.java:221)
        at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:899)
        at io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketCha
nnel.java:275)
        at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(Abstra
ctNioByteChannel.java:119)
        at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.jav
a:652)
        at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEve
ntLoop.java:575)
        at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.ja
va:489)
        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:451)
        at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThread
EventExecutor.java:140)
        at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorato
r.run(DefaultThreadFactory.java:144)
        at java.lang.Thread.run(Thread.java:745)
17/04/10 18:17:49 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEnd
point stopped!
17/04/10 18:17:49 WARN TransportChannelHandler: Exception in connection from /10
.78.130.134:34113
java.io.IOException: An existing connection was forcibly closed by the remote ho
st
        at sun.nio.ch.SocketDispatcher.read0(Native Method)
        at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:43)
        at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
        at sun.nio.ch.IOUtil.read(IOUtil.java:192)
        at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
        at io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirect
ByteBuf.java:221)
        at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:899)
        at io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketCha
nnel.java:275)
        at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(Abstra
ctNioByteChannel.java:119)
        at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.jav
a:652)
        at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEve
ntLoop.java:575)
        at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.ja
va:489)
        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:451)
        at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThread
EventExecutor.java:140)
        at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorato
r.run(DefaultThreadFactory.java:144)
        at java.lang.Thread.run(Thread.java:745)
17/04/10 18:17:49 WARN TransportChannelHandler: Exception in connection from /10
.78.130.134:34090
java.io.IOException: An existing connection was forcibly closed by the remote ho
st
        at sun.nio.ch.SocketDispatcher.read0(Native Method)
        at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:43)
        at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
        at sun.nio.ch.IOUtil.read(IOUtil.java:192)
        at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
        at io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirect
ByteBuf.java:221)
        at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:899)
        at io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketCha
nnel.java:275)
        at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(Abstra
ctNioByteChannel.java:119)
        at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.jav
a:652)
        at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEve
ntLoop.java:575)
        at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.ja
va:489)
        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:451)
        at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThread
EventExecutor.java:140)
        at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorato
r.run(DefaultThreadFactory.java:144)
        at java.lang.Thread.run(Thread.java:745)
17/04/10 18:17:49 INFO MemoryStore: MemoryStore cleared
17/04/10 18:17:49 INFO BlockManager: BlockManager stopped
17/04/10 18:17:49 INFO BlockManagerMaster: BlockManagerMaster stopped
17/04/10 18:17:49 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint:
OutputCommitCoordinator stopped!
17/04/10 18:17:49 INFO SparkContext: Successfully stopped SparkContext
17/04/10 18:17:49 INFO ShutdownHookManager: Shutdown hook called
17/04/10 18:17:49 INFO ShutdownHookManager: Deleting directory C:\Users\mklrjv\
AppData\Local\Temp\spark-3213c1b3-9a85-42b0-ba04-6e0e46a90d98

0 个答案:

没有答案