pymongo如何快速上传20GB文件进入mongoDB

时间:2017-01-20 18:07:41

标签: python mongodb pymongo

我在python中尝试pool():

import pandas as pd
import csv
from pymongo import MongoClient
import codecs
import re 
import codecs
from multiprocessing.dummy import Pool as ThreadPool 

# read csv file and put the file into database using pymongo
mongo_client = MongoClient('111.111.11.111',maxPoolSize=200) 
db = mongo_client.mydb

def upload(reader):
    for each in reader:
        row={}
        txt = re.split(" ",str(each))
        row["time"] = re.split("'",txt[0])[1]
        row["ticker"] = txt[1]
        row["price"] = re.split("\((.*?)\)", txt[2])[1]
        row["open"] = re.split("\((.*?)\)", txt[3])[1]
        db.price.insert_one(row)

pool = ThreadPool(10) 
results = pool.map(upload, csv.reader(codecs.open('C:\\log.txt', 'rU', 'utf-16')))  

我们的想法是将大型log.txt文件分成10个10个池,然后并行运行以优化速度。但是数据库中没有任何更新,这意味着我的代码无法正常工作。这有什么不对? (我确定问题不在于上传功能,因为如果我在没有pool()的情况下运行它可以正常工作)

0 个答案:

没有答案