我正在尝试通过以下定义的功能运行某些数据来处理这些数据。似乎可以正常运行该程序,但是循环并不会重复我期望的次数。
只要在函数内部而不是在if语句下,将return语句放在哪里似乎都没有关系。
我尝试过独立地在每个for循环下编写行,并且在每种情况下都会写入预期的行数。
def _ManhattanDistance(x,y):
a = 0
for i in range(0,len(x)):
a += abs(float(x[i])-float(y[i]))
return a
def _CabFare(x,y,z):
with open(x, 'r') as f:
with open(y, 'r') as g:
with open(z, 'wb') as h:
reader_1 = csv.reader(f)
reader_2 = csv.reader(g)
writer = csv.writer(h)
for row_b in reader_2:
for row_a in reader_1:
if _ManhattanDistance(row_a,row_b) > 0:
writer.writerow(row_a)
writer.writerow(row_b)
return
作为参考,给定我的输入,reader_1应该有200行,而reader_2应该有17145行。在我们的包含阈值为零的情况下,我希望输出文件中包含17145 * 200 = 3429000行-我得到的是400行的输出。
答案 0 :(得分:1)
reader
是一个有状态的迭代器。一旦用尽它,就完成了,您需要重新打开它以再次遍历该文件:
def _CabFare(x,y,z):
with open(x, 'r') as f:
with open(y, 'r') as g:
with open(z, 'wb') as h:
reader_2 = csv.reader(g)
writer = csv.writer(h)
for row_b in reader_2:
reader_1 = csv.reader(f) # Reopen reader_1 for each iteration
for row_a in reader_1:
if _ManhattanDistance(row_a,row_b) > 0:
writer.writerow(row_a)
writer.writerow(row_b)
答案 1 :(得分:1)
这似乎可行:
from itertools import product
def _CabFare(x,y,z):
with open(x, 'r') as f, open(y, 'r') as g, open(z, 'wb') as h:
writer = csv.writer(h)
for row_a, row_b in product(csv.reader(f), csv.reader(g)):
if _ManhattanDistance(row_a, row_b) > 0:
writer.writerow(row_a)
writer.writerow(row_b)
速度较慢,但占用的内存较少:
def _CabFare(x,y,z):
with open(x, 'r') as f, open(z, 'wb') as h:
writer = csv.writer(h)
for row_a in csv.reader(f):
with open(y, 'r') as g:
for row_b in csv.reader(g):
if _ManhattanDistance(row_a, row_b) > 0:
writer.writerow(row_a)
writer.writerow(row_b)