Question

我正在处理一个CSV文件来分析讲座反馈数据，格式就像

"5631","18650","10",,,"2015-09-18 09:35:11"
"18650","null","10",,,"2015-09-18 09:37:12"
"18650","5631","10",,,"2015-09-18 09:37:19"
"58649","null","6",,,"2015-09-18 09:38:13"
"45379","31541","10","its friday","nothing yet keep it up","2015-09-18 09:39:46"

我试图摆脱不良数据。只有带有＆＃34; id1＆＃34;，＆＃34; id2＆＃34;的数据条目 AND 另一个相应的＆＃34; id2＆＃34;，＆＃34; id1＆＃34;被认为是有效的。

我正在使用嵌套循环来尝试为每一行找到匹配的条目。然而，外环似乎无缘无故地停止了一半。这是我的代码

class Filter:
    file1 = open('EncodedPeerInteractions.FA2015.csv')
    peerinter = csv.reader(file1,delimiter=',') 
    def __init__(self):
        super()

    def filter(self):
        file2 = open('FilteredInteractions.csv','a')
        for row in self.peerinter:
            print(row)
            if row[0] == 'null' or row[1] == 'null':
                continue
            id1 = int(row[0])
            id2 = int(row[1])
            for test in self.peerinter:
                if test[0] == 'null' or test[1] == 'null':
                    continue
                if int(test[0]) == id2 and int(test[1]) == id1:
                    file2.write("\n")
                    file2.write(str(row))
                    break
        file2.close()

我曾尝试使用pdb来执行代码，对于前几个循环一切都很好，然后突然跳转到file2.close（）并返回。该程序会打印出一些有效的条目，但还不够。

我测试了csv文件并将其加载到内存中，超过18000个条目。我测试使用print但它给出了相同的结果，因此它对append文件没有任何问题。

修改

现在我明白了问题所在。正如this question所说，我在匹配时突然爆发但是当没有匹配时，内循环将消耗所有文件而不重置它。当它返回到外环时，它就会结束。我应该把它变成一个列表或让它重置。

Answer 1

你正在以这种方式使它变得更加复杂。

假设：

$ cat /tmp/so.csv
"5631","18650","10",,,"2015-09-18 09:35:11"
"18650","null","10",,,"2015-09-18 09:37:12"
"18650","5631","10",,,"2015-09-18 09:37:19"
"58649","null","6",,,"2015-09-18 09:38:13"
"45379","31541","10","its friday","nothing yet keep it up","2015-09-18 09:39:46"

您可以使用csv和filter来获得所需内容：

>>> with open('/tmp/so.csv') as f:
...    list(filter(lambda row: 'null' not in row[0:2], csv.reader(f)))
... 
[['5631', '18650', '10', '', '', '2015-09-18 09:35:11'], 
 ['18650', '5631', '10', '', '', '2015-09-18 09:37:19'], 
 ['45379', '31541', '10', 'its friday', 'nothing yet keep it up', '2015-09-18 09:39:46']]

Answer 2

尝试执行以下操作：

def filter(file1, file2):
    with open(file1, 'r') as f1:
      peerinter = csv.reader(file1,delimiter=',') 
      with open(file2, 'a') as f2:
        for row in peerinter:
        ...

使用with open()语法将其包装在上下文管理器中，这将确保文件在最后正确关闭。我猜你的问题源于你将一个文件作为一个类变量打开，而另一个文件在方法中打开。

Python for循环在迭代CSV行时不合理地停止了一半

2 个答案: