Question

我有一堆带有CSV的文件夹和子文件夹，这些文件夹和子文件夹都带有引号，需要删除这些引号。因此，我正在尝试构建一个脚本，该脚本可以迭代并在所有CSV上执行操作。

下面是我的代码。

它可以正确识别什么是CSV，什么不是CSV。它会全部重写它们-但它是在写入空白数据-而不是没有引号的行数据。

我知道这是在14-19行附近发生的，但我不知道该怎么办。

import csv
import os


rootDir = '.'

for dirName, subDirList, fileList in os.walk(rootDir):
    print('Found directory: %s' % dirName)
    for fname in fileList:

        # Check if it's a .csv first
        if fname.endswith('.csv'):

            input = csv.reader(open(fname, 'r'))
            output = open(fname, 'w')

            with output:
                writer = csv.writer(output)
                for row in input:
                    writer.writerow(row)

        # Skip if not a .csv
        else:
            print 'Not a .csv!!'

Answer 1

问题在这里：

input = csv.reader(open(fname, 'r'))
output = open(fname, 'w')

在open模式下执行第二个'w'时，它将擦除文件。因此，您的input遍历了一个空文件。

解决此问题的一种方法是将整个文件读入内存，然后擦除整个文件并重写：

input = csv.reader(open(fname, 'r'))
contents = list(input)
output = open(fname, 'w')
with output:
    writer = csv.writer(output)
    for row in contents:
        writer.writerow(row)

您可以简化一下：

with open(fname, 'r') as infile:
    contents = list(csv.reader(infile))
with open(fname, 'w') as outfile:
    csv.writer(outfile).writerows(contents)

或者，您可以随时写入临时文件，然后将临时文件移到原始文件的顶部。这有点复杂，但是它有一个主要的优势-如果在编写过程中出现错误（或有人关闭了计算机），您仍然拥有旧文件并且可以重新开始，而不是拥有43％的文件。新文件，所有数据都将丢失：

dname = os.path.dirname(fname)
with open(fname, 'r') as infile, tempfile.NamedTemporaryFile('w', dir=dname, delete=False) as outfile:
    writer = csv.writer(outfile)
    for row in csv.reader(infile):
        writer.writerow(row)
os.replace(outfile.name, fname)

如果您不使用Python 3.3+，则没有os.replace。在Unix上，您可以改用os.rename，但在Windows上……要做到这一点很痛苦，您可能想在PyPI上寻找第三方库。（那时我还没有使用过，如果您使用的是Windows XP / 2003或更高版本以及Python 2.6 / 3.2或更高版本，请购买，pyosreplace看起来很有希望。）

os.walk-ing through目录读取和写入所有CSV

1 个答案: