我正在尝试阅读3年的数据文件(每个日期一个),而我感兴趣的部分通常很小(总共约140万行),与父文件相比(每个约90MB和150万)行)。下面的代码在过去对我来说效果很好,文件数量较少。但是要处理1095个文件,它正在爬行(大约需要3-4秒才能读取一个文件)。有什么建议可以提高效率/速度吗?
import pandas as pd
from glob import glob
file_list = glob(r'C:\Temp2\dl*.csv')
for file in file_list:
print(file)
df = pd.read_csv(file, header=None)
df = df[[0,1,3,4,5]]
df2 = df[df[0].isin(det_list)]
if file_list[0]==file:
rawdf = df2
else:
rawdf = rawdf.append(df2)
答案 0 :(得分:3)
IIUC,试试这个:
coprimes :: Int -> Int -> [[Int]]
coprimes x y | y <= x = []
| otherwise = (x : filter ((== 1) . gcd x) (tail [x..y])) : (coprimes . head . tail) [x..y] y
*Main> coprimes 100 120
[[100,101,103,107,109,111,113,117,119],[101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120],[102,103,107,109,113,115],[103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120],[104,105,107,109,111,113,115,119],[105,106,107,109,113,116,118],[106,107,109,111,113,115,117,119],[107,108,109,110,111,112,113,114,115,116,117,118,119,120],[108,109,113,115,119],[109,110,111,112,113,114,115,116,117,118,119,120],[110,111,113,117,119],[111,112,113,115,116,118,119],[112,113,115,117],[113,114,115,116,117,118,119,120],[114,115,119],[115,116,117,118,119],[116,117,119],[117,118,119],[118,119],[119,120]]
(0.02 secs, 811,576 bytes)