上下文
我有大(现在30mb但后来可能超过千兆字节)csv文件(185行)要搜索一些值(times
的每个元素)chunk by chunk(csv的6值的块)行,如果发现它写在另一个文件中。即从times
排序的deque中获取一个元素,并在另一个deque(即rdr = deque(reder)
)中搜索rdr
中的6个元素,如果找到写入文件并继续times
中的下一个元素1}} deque。
问题:
我已经编写了一个完美的工作代码,但速度太慢了(8小时)。我想要更好的表现。我想到了多处理 - 我没有通过,因此寻求帮助。我使用了一个函数ddd
来获取调用范围内的所有参数,除了我明确传递的times1
。
代码我试过:
dim = [0,76,'1.040000',1,1,'1.000000']+min_max_ret(X,Y)
times = deque(sorted(list(timestep),key=lambda x:ast.literal_eval(x)))
def ddd(times1):# ddd(outfl, rdr, acc_ret, FR_XY, width, length) all these arguments are get from the calling scope.
for tim in times1:
time = ['{0:.6f}'.format(ast.literal_eval(tim)/1000.000000)]
outfl.writelines([u'2 ********* TIMESTEP']+['\n']+time+['\n'])
for index,line in enumerate(rdr):
if index!=0:
cnt = 8
for counter in [qq for qq in [line[jj:jj+6] for jj in range(8,len(line),6)] if len(qq)==6]:
counter = map(unicode.strip,counter)
if counter[5]==tim:
cr_id = line[0]
acc = '{0:.6f}'.format(acc_ret(counter[3], counter[4]))
car_ltlng = map(unicode.strip,[line[cnt],line[cnt+1],line[cnt+6],line[cnt+7]])
xy = FR_XY(*car_ltlng)
data = [3]+[cr_id]+[1,1]+xy+[length,width]+[counter[2]]+[acc]
outfl.writelines([unicode(ww).strip()+'\n' for ww in data])
cnt+=6
print "Time is %s is completed"%tim
with open(r"C:\my_output_ascii_14Dec.trj",'w') as outfl:
with open(fl,'r') as inf:
reder = csv.reader(inf,delimiter=';')
rdr = deque(reder)
outfl.writelines([str(w)+'\n' for w in dim])
p = Pool(5)
p.map(ddd,times)#[[xx for xx in islice(times,ii,ii+10)] for ii in range(0,len(times))])
示例csv内容:
car_id; car_type; entry_gate; entry_time(ms); exit_gate; exit_time(ms); traveled_dist(m); avg_speed(m/s); trajectory(x[m];y[m];speed[m/s];a_tangential[ms-2];a_lateral[ms-2];timestamp[ms];)
24; Bus; 25; 4300.00; 26; 48520.00; 118.47; 2.678999; 509552.78; 5039855.59; 10.0740; 0.4290; 0.2012; 0.0000; 509552.97; 5039855.57; 10.0821; 0.3853; 0.2183; 20.0000; 509553.17; 5039855.55; 10.0886; 0.2636; 0.2356; 40.0000; 509553.37; 5039855.53; 10.0927; 0.1420; 0.2532; 60.0000; 509553.57; 5039855.51; 10.0943; 0.0203; 0.2710; 80.0000; 509553.76; 5039855.48; 10.0935; -0.1014; 0.2890; 100.0000; 509553.96; 5039855.46; 10.0902; -0.2231; 0.3073; 120.0000; 509554.16; 5039855.44; 10.0846; -0.3448; 0.3257; 140.0000; 509554.36; 5039855.42; 10.0765; -0.4665; 0.3444; 160.0000; 509554.56; 5039855.40; 10.0659; -0.5881; 0.3633; 180.0000; 509554.76; 5039855.37; 10.0529; -0.7098; 0.3823; 200.0000; 509554.96; 5039855.35; 10.0375; -0.8315; 0.4016; 220.0000; 509555.17; 5039855.33; 10.0197; -0.9532; 0.4211; 240.0000; 509555.37;
here处的部分csv文件。