使用upsert方法对MongoDB批量加载问题

时间:2015-12-18 18:22:37

标签: python mongodb dictionary bulk upsert

我正在尝试将8m [rows] * 1k [columns] python数据帧加载到Mongo中。为了提高性能,我计划使用Mongo批量操作。每天我都要在集合中进行更新,以便我使用批量操作的upsert方法。以下是我准备的代码,

def BulkLoad(self,Dataset):

    counter = 0;        
    empty = '000'

    columns = []
    records = []
    DataDict = {}
    for col in Dataset.columns:
        columns.append(col)

    try:
        db = self.getDatabase()
        bulk = db.collection.initialize_ordered_bulk_op()

        for j in range(len(Dataset)):
            records.clear()
            DataDict.clear()
            DataDict.update(
                    {'CreatedBy': empty, 'ModifiedBy': empty, 'Value': Score})

            for column in columns:
                colValue = str(Dataset[column][j])
                if (colValue == 'nan'):
                    colValue = colValue.replace('nan', '')


                DataDict.update({column: colValue})

            records.append(DataDict)
            print("list is ",records)

            Id = DataDict['Id']
            Number = DataDict['Number']

            print(DataDict)               

            bulk.find(
                    {'Id': Id, 'Number': Number}).upsert().update(
                    {
                         '$set': {'Id': Id, 'Number': Number,'Events':records}
                    })

            counter += 1

            if counter % 1000 == 0:
                result = bulk.execute()
                logging.info(pprint(result))
                bulk = db.coll.initialize_ordered_bulk_op()

        if counter % 1000 != 0:
            result = bulk.execute()
            logging.info(pprint(result))

    except Exception as e:
        logging.exception(e)
    except BulkWriteError as bre:
        logging.error(pprint(bre.details))

如果我将10个样本行加载到Mongo集合中,则所有文档都具有相同的第10行值。我知道它是因为python字典引用问题。

请问有人给我这个建议吗?

1 个答案:

答案 0 :(得分:0)

def BulkLoad(self,Dataset):

Toolbar