如何在持久性螺栓中配置正确的并行度?

时间:2019-04-25 15:35:20

标签: java mongodb apache-storm topology

我正在使用Apache Storm创建一个拓扑,该拓扑最初会在文件中读取元组的“流”,然后将其拆分并将元组存储在mongodb中。

我在Atlas上有一个共享副本集的集群。我已经开发了拓扑,并且如果我使用单个线程,则该解决方案可以正常工作。

    public static StormTopology build() {
        return buildWithSpout();
    }

    public static StormTopology buildWithSpout() {
        Config config = new Config();
        TopologyBuilder builder = new TopologyBuilder();

        CsvSpout datasetSpout = new CsvSpout("file.txt");
        SplitterBolt splitterBolt = new SplitterBolt(",");
        PartitionMongoInsertBolt insertPartitionBolt = new PartitionMongoInsertBolt();

        builder.setSpout(DATA_SPOUT_ID, datasetSpout, 1);
        builder.setBolt(DEPENDENCY_SPLITTER_ID, splitterBolt, 1).shuffleGrouping(DATA_SPOUT_ID);
        builder.setBolt(UPDATER_COUNTER_ID, insertPartitionBolt, 1).shuffleGrouping(DEPENDENCY_SPLITTER_ID);
    }

但是,当我使用并行进程时,尽管元组已由前一个螺栓正确发出,但我的持久性螺栓并没有将所有元组保存在mongodb中。

        builder.setSpout(DATA_SPOUT_ID, datasetSpout, 1);
        builder.setBolt(DEPENDENCY_SPLITTER_ID, splitterBolt, 3).shuffleGrouping(DATA_SPOUT_ID);
        builder.setBolt(UPDATER_COUNTER_ID, insertPartitionBolt, 3).shuffleGrouping(DEPENDENCY_SPLITTER_ID);

这是我的第一个螺栓:

public class SplitterBolt extends BaseBasicBolt {
    private String del;
    private MongoConnector db = null;

    public SplitterBolt(String del) {
        this.del = del;
    }

    public void prepare(Map stormConf, TopologyContext context) {
        db = MongoConnector.getInstance();
    }

    public void execute(Tuple input, BasicOutputCollector collector) {
        String tuple = input.getStringByField("tuple");
        int idTuple = Integer.parseInt(input.getStringByField("id"));

        String opString = "";
        String[] data = tuple.split(this.del);
        for(int i=0; i < data.length; i++) {
            OpenBitSet attrs = new OpenBitSet();
            attrs.fastSet(i);
            opString = Utility.toStringOpenBitSet(attrs, 5);
            collector.emit(new Values(idTuple, opString, data[i]));
        }
        db.incrementCount();
    }

    public void declareOutputFields(OutputFieldsDeclarer declarer) {
        declarer.declare(new Fields("idtuple","binaryattr","value"));
    }
}

这是我的持久性螺栓,它存储在mongo中的所有元组:

public class PartitionMongoInsertBolt extends BaseBasicBolt {
    private MongoConnector mongodb = null;

    public void prepare(Map stormConf, TopologyContext context) {
        //Singleton Instance
        mongodb = MongoConnector.getInstance();
    }

    public void execute(Tuple input, BasicOutputCollector collector) {
        mongodb.insertUpdateTuple(input);
    }

    public void declareOutputFields(OutputFieldsDeclarer declarer) {}
}

我唯一的疑问是,我对mongo的连接类使用了单例模式。这可能是个问题吗?

更新

这是我的MongoConnector类:

public class MongoConnector {
    private MongoClient mongoClient = null;
    private MongoDatabase database = null;
    private MongoCollection<Document> partitionCollection = null;

    private static MongoConnector mongoInstance = null;

    public MongoConnector() {
        MongoClientURI uri = new MongoClientURI("connection string");
        this.mongoClient = new MongoClient(uri);
        this.database = mongoClient.getDatabase("db.database");
        this.partitionCollection = database.getCollection("db.collection");
    }

    public static MongoConnector getInstance() {
        if (mongoInstance == null)
            mongoInstance = new MongoConnector();
        return mongoInstance;
    }

    public void insertUpdateTuple2(Tuple tuple) {
        int idTuple = (Integer) tuple.getValue(0);
        String attrs = (String) tuple.getValue(1);
        String value = (String) tuple.getValue(2);
        value = value.replace('.', ',');

        Bson query = Filters.eq("_id", attrs);
        Document docIterator = this.partitionCollection.find(query).first();

        if (docIterator != null) { 
            Bson newValue = new Document(value, idTuple);
            Bson updateDocument = new Document("$push", newValue);
            this.partitionCollection.updateOne(docIterator, updateDocument);
        } else { 
            Document document = new Document();
            document.put("_id", attrs);
            ArrayList<Integer> partition = new ArrayList<Integer>();
            partition.add(idTuple);
            document.put(value, partition);
            this.partitionCollection.insertOne(document);
        }
    }
}   

解决方案更新

我已经解决了以下问题:

this.partitionCollection.updateOne(docIterator, updateDocument);

this.partitionCollection.findOneAndUpdate(query, updateDocument);

0 个答案:

没有答案