有效地将多个csv文件读入Pandas数据帧

时间:2017-08-14 18:32:47

标签: python-3.x pandas dataframe glob

我正在尝试阅读3年的数据文件(每个日期一个),而我感兴趣的部分通常很小(总共约140万行),与父文件相比(每个约90MB和150万)行)。下面的代码在过去对我来说效果很好,文件数量较少。但是要处理1095个文件,它正在爬行(大约需要3-4秒才能读取一个文件)。有什么建议可以提高效率/速度吗?

import pandas as pd
from glob import glob

file_list = glob(r'C:\Temp2\dl*.csv') 
for file in file_list:
    print(file)
    df = pd.read_csv(file, header=None)
    df = df[[0,1,3,4,5]]
    df2 = df[df[0].isin(det_list)]  
    if file_list[0]==file:
        rawdf = df2
    else:
        rawdf = rawdf.append(df2)

1 个答案:

答案 0 :(得分:3)

IIUC,试试这个:

coprimes :: Int -> Int -> [[Int]]
coprimes x y | y <= x    = []
             | otherwise = (x : filter ((== 1) . gcd x) (tail [x..y])) : (coprimes . head . tail) [x..y] y

*Main> coprimes 100 120
[[100,101,103,107,109,111,113,117,119],[101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120],[102,103,107,109,113,115],[103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120],[104,105,107,109,111,113,115,119],[105,106,107,109,113,116,118],[106,107,109,111,113,115,117,119],[107,108,109,110,111,112,113,114,115,116,117,118,119,120],[108,109,113,115,119],[109,110,111,112,113,114,115,116,117,118,119,120],[110,111,113,117,119],[111,112,113,115,116,118,119],[112,113,115,117],[113,114,115,116,117,118,119,120],[114,115,119],[115,116,117,118,119],[116,117,119],[117,118,119],[118,119],[119,120]]
(0.02 secs, 811,576 bytes)