我正在使用Apache Storm创建一个拓扑,该拓扑最初会在文件中读取元组的“流”,然后将其拆分并将元组存储在mongodb中。
我在Atlas上有一个共享副本集的集群。我已经开发了拓扑,并且如果我使用单个线程,则该解决方案可以正常工作。
public static StormTopology build() {
return buildWithSpout();
}
public static StormTopology buildWithSpout() {
Config config = new Config();
TopologyBuilder builder = new TopologyBuilder();
CsvSpout datasetSpout = new CsvSpout("file.txt");
SplitterBolt splitterBolt = new SplitterBolt(",");
PartitionMongoInsertBolt insertPartitionBolt = new PartitionMongoInsertBolt();
builder.setSpout(DATA_SPOUT_ID, datasetSpout, 1);
builder.setBolt(DEPENDENCY_SPLITTER_ID, splitterBolt, 1).shuffleGrouping(DATA_SPOUT_ID);
builder.setBolt(UPDATER_COUNTER_ID, insertPartitionBolt, 1).shuffleGrouping(DEPENDENCY_SPLITTER_ID);
}
但是,当我使用并行进程时,尽管元组已由前一个螺栓正确发出,但我的持久性螺栓并没有将所有元组保存在mongodb中。
builder.setSpout(DATA_SPOUT_ID, datasetSpout, 1);
builder.setBolt(DEPENDENCY_SPLITTER_ID, splitterBolt, 3).shuffleGrouping(DATA_SPOUT_ID);
builder.setBolt(UPDATER_COUNTER_ID, insertPartitionBolt, 3).shuffleGrouping(DEPENDENCY_SPLITTER_ID);
这是我的第一个螺栓:
public class SplitterBolt extends BaseBasicBolt {
private String del;
private MongoConnector db = null;
public SplitterBolt(String del) {
this.del = del;
}
public void prepare(Map stormConf, TopologyContext context) {
db = MongoConnector.getInstance();
}
public void execute(Tuple input, BasicOutputCollector collector) {
String tuple = input.getStringByField("tuple");
int idTuple = Integer.parseInt(input.getStringByField("id"));
String opString = "";
String[] data = tuple.split(this.del);
for(int i=0; i < data.length; i++) {
OpenBitSet attrs = new OpenBitSet();
attrs.fastSet(i);
opString = Utility.toStringOpenBitSet(attrs, 5);
collector.emit(new Values(idTuple, opString, data[i]));
}
db.incrementCount();
}
public void declareOutputFields(OutputFieldsDeclarer declarer) {
declarer.declare(new Fields("idtuple","binaryattr","value"));
}
}
这是我的持久性螺栓,它存储在mongo中的所有元组:
public class PartitionMongoInsertBolt extends BaseBasicBolt {
private MongoConnector mongodb = null;
public void prepare(Map stormConf, TopologyContext context) {
//Singleton Instance
mongodb = MongoConnector.getInstance();
}
public void execute(Tuple input, BasicOutputCollector collector) {
mongodb.insertUpdateTuple(input);
}
public void declareOutputFields(OutputFieldsDeclarer declarer) {}
}
我唯一的疑问是,我对mongo的连接类使用了单例模式。这可能是个问题吗?
更新
这是我的MongoConnector类:
public class MongoConnector {
private MongoClient mongoClient = null;
private MongoDatabase database = null;
private MongoCollection<Document> partitionCollection = null;
private static MongoConnector mongoInstance = null;
public MongoConnector() {
MongoClientURI uri = new MongoClientURI("connection string");
this.mongoClient = new MongoClient(uri);
this.database = mongoClient.getDatabase("db.database");
this.partitionCollection = database.getCollection("db.collection");
}
public static MongoConnector getInstance() {
if (mongoInstance == null)
mongoInstance = new MongoConnector();
return mongoInstance;
}
public void insertUpdateTuple2(Tuple tuple) {
int idTuple = (Integer) tuple.getValue(0);
String attrs = (String) tuple.getValue(1);
String value = (String) tuple.getValue(2);
value = value.replace('.', ',');
Bson query = Filters.eq("_id", attrs);
Document docIterator = this.partitionCollection.find(query).first();
if (docIterator != null) {
Bson newValue = new Document(value, idTuple);
Bson updateDocument = new Document("$push", newValue);
this.partitionCollection.updateOne(docIterator, updateDocument);
} else {
Document document = new Document();
document.put("_id", attrs);
ArrayList<Integer> partition = new ArrayList<Integer>();
partition.add(idTuple);
document.put(value, partition);
this.partitionCollection.insertOne(document);
}
}
}
解决方案更新
我已经解决了以下问题:
this.partitionCollection.updateOne(docIterator, updateDocument);
在
this.partitionCollection.findOneAndUpdate(query, updateDocument);