为什么此嵌套的for循环在第二个循环中重复两次,而之后又不重复呢?

时间:2019-04-16 06:46:09

标签: python csv

我正在尝试通过以下定义的功能运行某些数据来处理这些数据。似乎可以正常运行该程序,但是循环并不会重复我期望的次数。

只要在函数内部而不是在if语句下,将return语句放在哪里似乎都没有关系。

我尝试过独立地在每个for循环下编写行,并且在每种情况下都会写入预期的行数。

def _ManhattanDistance(x,y):
    a = 0
    for i in range(0,len(x)):
        a += abs(float(x[i])-float(y[i]))
    return a

def _CabFare(x,y,z):
    with open(x, 'r') as f:
        with open(y, 'r') as g:
            with open(z, 'wb') as h:
                reader_1 = csv.reader(f)
                reader_2 = csv.reader(g)
                writer = csv.writer(h)
                for row_b in reader_2:
                    for row_a in reader_1:
                        if _ManhattanDistance(row_a,row_b) > 0:
                            writer.writerow(row_a)
                            writer.writerow(row_b)
                return

作为参考,给定我的输入,reader_1应该有200行,而reader_2应该有17145行。在我们的包含阈值为零的情况下,我希望输出文件中包含17145 * 200 = 3429000行-我得到的是400行的输出。

2 个答案:

答案 0 :(得分:1)

reader是一个有状态的迭代器。一旦用尽它,就完成了,您需要重新打开它以再次遍历该文件:

def _CabFare(x,y,z):
    with open(x, 'r') as f:
        with open(y, 'r') as g:
            with open(z, 'wb') as h:
                reader_2 = csv.reader(g)
                writer = csv.writer(h)
                for row_b in reader_2:
                    reader_1 = csv.reader(f) # Reopen reader_1 for each iteration
                    for row_a in reader_1:
                        if _ManhattanDistance(row_a,row_b) > 0:
                            writer.writerow(row_a)
                            writer.writerow(row_b)

答案 1 :(得分:1)

这似乎可行:

from itertools import product

def _CabFare(x,y,z):
    with open(x, 'r') as f, open(y, 'r') as g, open(z, 'wb') as h:
        writer = csv.writer(h)
        for row_a, row_b in product(csv.reader(f), csv.reader(g)):
            if _ManhattanDistance(row_a, row_b) > 0:
                writer.writerow(row_a)
                writer.writerow(row_b)

速度较慢,但​​占用的内存较少:

def _CabFare(x,y,z):
    with open(x, 'r') as f, open(z, 'wb') as h:
        writer = csv.writer(h)
        for row_a in csv.reader(f):
            with open(y, 'r') as g:
                for row_b in csv.reader(g):
                    if _ManhattanDistance(row_a, row_b) > 0:
                        writer.writerow(row_a)
                        writer.writerow(row_b)