如何在python中为熊猫实现多进程池

时间:2018-07-10 05:59:02

标签: python-3.x pandas multiprocessing

iam python初学者,我尝试将csv文件转储到mongodb来存储它的值为6gb的csv文件。我需要将所有数据转储到mongodb进行存储。我试图插入,但是要花很多时间来处理。

scala classes

我收到错误和警告,并且花费了很多时间

import pandas as pd
import sys, getopt, csv, pprint
import motor.motor_asyncio
from multiprocessing import Pool
import multiprocessing as mp
from multiprocessing import Process, freeze_support

from motor.frameworks import asyncio
from pymongo import MongoClient
# from motor.motor_asyncio import AsyncIOMotorClient

# mongo_client=MongoClient("localhost",27017)
client = motor.motor_asyncio.AsyncIOMotorClient('localhost', 27017)


# db=mongo_client.datafeed

db = client.datafeed
db.segment.drop()


data=pd.read_csv('amz.csv',low_memory=False,nrows=100000)
nprocs =mp.cpu_count() - 1
pool = Pool(nprocs)
df =pd.DataFrame(data[["productId",
"title","imageUrlStr","productUrl","mrp","sellingPrice"]])
data = df.to_dict(orient='records')

# print(data)
# db.segment.insert(data)
async def do_insert():
    result = await db.segment.insert(data)
    print('result %s' % repr(result))

print("process done")
loop = asyncio.get_event_loop()
loop.run_until_complete(do_insert())

我无法转储mongodb中的所有行。

0 个答案:

没有答案