如何在Spark中的mongo db的readConfig中添加uri和数据库?

时间:2017-02-22 17:52:47

标签: mongodb apache-spark

在下面的代码中,我尝试使用选项映射在readConfig中传递mongo uri和数据库。但它没有找到uri或数据库的错误。

`

public JavaMongoRDD<Document> getRDDFromDS(DataSourceInfo ds, String collectionName){
        String mongoDBURI = "mongodb://"
                + PropertiesFileEncryptorUtil.decryptData(ds.getDbUsername()) + ":"
                + PropertiesFileEncryptorUtil.decryptData(ds.getDbPassword()) + "@"
                + ds.getHostName() + ":" + ds.getPort();
        Map<String, String> readOverrides = new HashMap<String, String>();
        readOverrides.put("uri", mongoDBURI);
        readOverrides.put("database", ds.getDbName());
        readOverrides.put("collection", collectionName);
        readOverrides.put("partitioner", mongoDBInputPartitioner);
        readOverrides.put("partitionKey", mongoDBPartitionKey);
        readOverrides.put("partitionSizeMB", mongoDBInputPartitionSize);

        ReadConfig readConf = ReadConfig.create(jsc).withOptions(readOverrides);
        JavaMongoRDD<Document> readRdd = MongoSpark.load(jsc, readConf);
        return readRdd;
    }`

传递uri和数据库的正确方法是什么。 提前谢谢。

1 个答案:

答案 0 :(得分:0)

您可以通过配置变量将配置参数传递给spark:

 val conf = new SparkConf().setAppName("YourAppName").setMaster("local[2]").set("spark.executor.memory","1g")
      .set("spark.app.id","YourSparkId")
      .set("spark.mongodb.input.uri","mongodb://127.0.0.1/yourdatabase.yourInputcollection?readPreference=primaryPreferred")
      .set("spark.mongodb.output.uri","mongodb://127.0.0.1/yourdatabase.yourOutputcollection")

之后,您需要将配置变量赋予spark上下文:

val sc = new SparkContext(conf)

val readConf = ReadConfig( sc )

然后你可以像这样读取mongo中的值:

 val rdd = sc.loadFromMongoDB( readConfig = readConfig )

并保存如下:

rdd.map( someMapFunction ).saveToMongoDB()

我希望我的回答很有帮助