根据第二列值拆分大型csv时,我将附加到每个单独的文件中。但是我不希望它在第一个文件后附加Header。
def split_csv_file(f, dst_dir, keyfunc):
csv_reader = csv.reader(f)
header = next(csv_reader)
csv_writers = {}
for row in csv_reader:
k = keyfunc(row)
with open(os.path.join(dst_dir, k), mode='a', newline='') as output:
writer = csv.writer(output)
writer.writerow(header)
csv_writers[k] = writer
csv_writers[k].writerow(row[0:1])
这就是我目前所得到的:
<option value=''>Choose SubGroup</option>
<option value='/2007-Accord-LX-Belts-s/5380.htm'>Belts</option>
<option value=''>Choose SubGroup</option>
<option value='/2007-Accord-LX-Belts-s/5381.htm'>Belts</option>
<option value=''>Choose SubGroup</option>
<option value='/2007-Accord-LX-Cooling-Fan-s/15089.htm'>Cooling Fan</option>
这就是我想要的:
<option value=''>Choose SubGroup</option>
<option value='/2007-Accord-LX-Belts-s/5380.htm'>Belts</option>
<option value='/2007-Accord-LX-Belts-s/5381.htm'>Belts</option>
<option value='/2007-Accord-LX-Cooling-Fan-s/15089.htm'>Cooling Fan</option>
更新:
def split_csv_file(f, dst_dir, keyfunc):
csv_reader = csv.reader(f)
header = next(csv_reader)
csv_writers ={}
headers={}
for row in csv_reader:
k = keyfunc(row)
if k in headers:
with open(os.path.join(dst_dir, k), 'w') as output:
csv_writers[k].writerow([header])
else:
headers[k]=1
with open(os.path.join(dst_dir, k), mode='a', newline='') as output:
writer = csv.writer(output)
csv_writers[k] = writer
csv_writers[k].writerow(row[0:1])
我更新了代码,现在收到“关键错误”我可能有什么问题?
以下是要拆分的文件示例:
<option value=''>Choose SubGroup</option>, ParentID
<option value='/1990-Accord-DX-Glass-s/37918.htm'>Glass</option>,Accord1990DX422F22A1BodyHardwareBackGlass
<option value='/1990-Accord-DX-Glass-s/37919.htm'>Glass</option>,Accord1990DX422F22A1BodyHardwareBackGlass
<option value='/1990-Accord-DX-Reveal-Moldings-s/69090.htm'>Reveal Moldings</option>,Accord1990DX422F22A1BodyHardwareBackGlass
<option value='/1990-Accord-DX-Reveal-Moldings-s/69091.htm'>Reveal Moldings</option>,Accord1990DX422F22A1BodyHardwareBackGlass
<option value='/1990-Accord-DX-Center-s/10331.htm'>Center</option>,Accord1990DX422F22A1BodyHardwareConsole
<option value='/1990-Accord-DX-Cowl-s/16006.htm'>Cowl</option>,Accord1990DX422F22A1BodyHardwareCowl
<option value='/1990-Accord-DX-Exterior-Trim-s/26889.htm'>Exterior Trim</option>,Accord1990DX422F22A1BodyHardwareFender
<option value='/1990-Accord-DX-Exterior-Trim-s/26890.htm'>Exterior Trim</option>,Accord1990DX422F22A1BodyHardwareFender
这里是错误: 关键错误:'Accord1990DX422F22A1BodyHardwareBackGlass.txt'
答案 0 :(得分:1)
如果你想获得每一秒,你可能需要考虑使用类似的东西:
rows[::2]
在您的具体情况下,它应该足够了:
def split_csv_file(f, dst_dir, keyfunc):
csv_reader = csv.reader(f)
header = next(csv_reader)
write_header = true
csv_writers = {}
for row in csv_reader:
k = keyfunc(row)
with open(os.path.join(dst_dir, k), mode='a', newline='') as output:
writer = csv.writer(output)
while write_header:
writer.writerow(header)
write_header = false
csv_writers[k] = writer
csv_writers[k].writerow(row[0:1])
答案 1 :(得分:1)
如果没有看到输入文件的结构,我想最简单的方法是检查k
的第一次出现。如果它是第一次出现,那么您可以打开一个文件,编写标题,并且该(以及任何后续外观)可以附加到头文件中。因此,您的代码将如下所示:
headers={}
for row in csv_reader:
k = keyfunc(row)
if k in headers:
with open(os.path.join(dst_dir, k), 'w') as output:
writer.writerow(header)
else:
headers[k]=1
with open(os.path.join(dst_dir, k), mode='a', newline='') as output:
writer = csv.writer(output)
csv_writers[k] = writer
csv_writers[k].writerow(row[0:1])
通过使用'w'
选项最初打开文件,您也不会意外地附加到旧版本,而是创建新的输出文件。否则,如果您运行问题,文件将永远附加 - 这可能是一个问题。