Question

我必须处理一个大型XML文件（大小约为25 MB），并将数据组织到文档中以导入MongoDB。

问题是，xml文档中有大约5-6种类型的元素，每行包含大约10k行。

获取一个类型为a的xml节点后，我必须获取类型为b，c，d等的相应元素。

我在节点中尝试做什么：

获取a类型的所有行。
对于每一行，使用xpath查找其对应的相关行，并创建文档。
在mongodb中插入文档

如果有10k行类型a，第二步运行10k次。我试图让它并行运行，以便事情不会永远。因此，async.forEach似乎是完美的解决方案。

async.forEach(rowsA,fetchA);

我的fetchrelations函数有点像这样

var fetchA = function(rowA) {
//covert the xml row into an object 
    var obj = {};
    for(i in rowA.attributes) {
    attribute = rowA.attributes[i];
    if(attribute.value === undefined) 
        continue;
    obj[attribute.name] = attribute.value;
    }
    console.log(obj.someattribute);
    //first other related rows, 
    //callback inserts the modified object with the subdocuments
    findRelations(obj,function(obj){
        insertA(obj,postInsert);
    });
};

在我尝试运行它之后，代码中的console.log每1.5秒运行一次，而不是像我预期的那样平行每行。在过去的两个小时里，我一直在摸不着头脑，试图解决这个问题，但我不确定我做错了什么。

我不熟悉节点，所以请耐心等待。

Answer 1

在我看来，你并没有声明并调用异步将传递给迭代器函数（fetchA）的回调函数。有关示例，请参阅forEach documentation。

您的代码可能需要更像......

var fetchA = function(rowA, cb) {
//covert the xml row into an object 
    var obj = {};
    for(i in rowA.attributes) {
    attribute = rowA.attributes[i];
    if(attribute.value === undefined) 
        cb();
    obj[attribute.name] = attribute.value;
    }
    console.log(obj.someattribute);
    //first other related rows, 
    //callback inserts the modified object with the subdocuments
    findRelations(obj,function(obj){
        insertA(obj,postInsert);
        cb();  // You may even need to call this within insertA or portInsert if those are asynchronous functions.
    });
};

使用nodejs async处理大型xml文件（带有关系）

1 个答案: