python newb here - 我正在尝试格式化一组非常粗略的csv我被发送,以便我可以将它们放入一个很好的postgres表中进行查询和分析。为了做到这一点,我首先使用csv.writer清除它们以删除包装每个条目的空行和双引号。这是我的代码的样子:
import os
import csv
import glob
from itertools import islice
files = glob.glob('/Users/foo/bar/*.csv')
# Loop through all of the csv's
for file in files:
# Get the filename from the path
outfile = os.path.basename(file)
with open(file, 'rb') as inp, open('/Users/foo/baz/' + outfile, 'wb') as out:
reader = csv.reader(inp)
writer = csv.writer(out)
for row in reader:
if row:
writer.writerow(row)
out.close()
它完美无缺,完全符合我的要求。输出csv看起来很棒。接下来,我尝试基本上从新清理的csv文件的开头和结尾切掉包含完全不必要的垃圾的一定数量的行(省略前8行和后2行)。由于我无法确定的原因,csv从代码的这一部分输出(缩写与之前的'with'块相同)完全为空:
with open('/Users/foo/baz/' + outfile, 'rb') as inp2, open('/Users/foo/qux/' + outfile, 'wb') as out2:
writer2 = csv.writer(out2)
reader2 = csv.reader(inp2)
row_count = sum(1 for row in reader2)
last_line_index = row_count - 3
for row in islice(reader2, 7, last_line_index):
writer2.writerow(row)
out2.close()
我知道由于我的'with'用法,每个块末尾的close()是多余的 - 我在查看here之后尝试了它作为一种方法。我还尝试将第二个'with'块放入另一个文件中并在运行第一个'with'块后运行它,但仍无济于事。非常感谢您的帮助!
此外,这是整个文件:
import os
import csv
import glob
from itertools import islice
files = glob.glob('/Users/foo/bar/*.csv')
# Loop through all of the csv's
for file in files:
# Get the filename from the path
outfile = os.path.basename(file)
with open(file, 'rb') as inp, open('/Users/foo/baz/' + outfile, 'wb') as out:
reader = csv.reader(inp)
writer = csv.writer(out)
for row in reader:
if row:
writer.writerow(row)
out.close()
with open('/Users/foo/baz/' + outfile, 'rb') as inp2, open('/Users/foo/qux/' + outfile, 'wb') as out2:
writer2 = csv.writer(out2)
reader2 = csv.reader(inp2)
row_count = sum(1 for row in reader2)
last_line_index = row_count - 3
for row in islice(reader2, 7, last_line_index):
writer2.writerow(row)
out2.close()
谢谢!
答案 0 :(得分:2)
有罪的一方是
row_count = sum(1 for row in reader2)
它读取reader2
的所有数据;现在,当您尝试for row in islice(reader2, 7, last_line_index)
时,您无法获得任何数据。
此外,您可能正在阅读大量空白行,因为您将文件打开为二进制文件;而是做
with open('file.csv', newline='') as inf:
rd = csv.reader(inf)
答案 1 :(得分:1)
你可以快速修复这样的代码(我对问题进行了评论,正如@Hugh Bothwell所说,你已经阅读了变量reader2
中的所有数据):
import os
import csv
import glob
from itertools import islice
files = glob.glob('/Users/foo/bar/*.csv')
# Loop through all of the csv's
for file in files:
# Get the filename from the path
outfile = os.path.basename(file)
with open(file, 'rb') as inp, open('/Users/foo/baz/' + outfile, 'wb') as out:
reader = csv.reader(inp)
writer = csv.writer(out)
for row in reader:
if row:
writer.writerow(row)
out.close()
with open('/Users/foo/baz/' + outfile, 'rb') as inp2, open('/Users/foo/qux/' + outfile, 'wb') as out2:
writer2 = csv.writer(out2)
reader2 = csv.reader(inp2)
row_count = sum(1 for row in csv.reader(inp2)) #here you separately count the amount of rows without read the variable reader2
last_line_index = row_count - 3
for row in islice(reader2, 7, last_line_index):
writer2.writerow(row)
out2.close()