如何在iPython中加载一个太大的csv文件?

时间:2015-07-16 12:57:52

标签: ipython bigdata ipython-notebook

如何在iPython中加载过大的csv文件?它似乎无法在内存中同时加载。

1 个答案:

答案 0 :(得分:2)

您可以使用此代码以块的形式读取文件,并且还可以通过多个处理器分发文件。

import pandas as pd 
import multiprocessing as mp

LARGE_FILE = "yourfile.csv"
CHUNKSIZE = 100000 # processing 100,000 rows at a time

def process_frame(df):
        # process data frame
        return len(df)

if __name__ == '__main__':
        reader = pd.read_csv(LARGE_FILE, chunksize=CHUNKSIZE)
        pool = mp.Pool(4) # use 4 processes

        funclist = []
        for df in reader:
                # process each data frame
                f = pool.apply_async(process_frame,[df])
                funclist.append(f)

        result = 0
        for f in funclist:
                result += f.get(timeout=10) # timeout in 10 seconds