删除csv split上的附加标题

时间:2016-02-18 15:17:44

标签: python csv

根据第二列值拆分大型csv时,我将附加到每个单独的文件中。但是我不希望它在第一个文件后附加Header。

def split_csv_file(f, dst_dir, keyfunc):
    csv_reader = csv.reader(f)
    header = next(csv_reader)
    csv_writers = {}
    for row in csv_reader:
        k = keyfunc(row)
        with open(os.path.join(dst_dir, k), mode='a', newline='') as output:
            writer = csv.writer(output)
            writer.writerow(header)
            csv_writers[k] = writer
            csv_writers[k].writerow(row[0:1])

这就是我目前所得到的:

<option value=''>Choose SubGroup</option>
<option value='/2007-Accord-LX-Belts-s/5380.htm'>Belts</option>
<option value=''>Choose SubGroup</option>
<option value='/2007-Accord-LX-Belts-s/5381.htm'>Belts</option>
<option value=''>Choose SubGroup</option>
<option value='/2007-Accord-LX-Cooling-Fan-s/15089.htm'>Cooling Fan</option>

这就是我想要的:

<option value=''>Choose SubGroup</option>
<option value='/2007-Accord-LX-Belts-s/5380.htm'>Belts</option>
<option value='/2007-Accord-LX-Belts-s/5381.htm'>Belts</option>
<option value='/2007-Accord-LX-Cooling-Fan-s/15089.htm'>Cooling Fan</option>

更新:

def split_csv_file(f, dst_dir, keyfunc):
    csv_reader = csv.reader(f)
    header = next(csv_reader)
    csv_writers ={}
    headers={}
    for row in csv_reader:
        k = keyfunc(row)
        if k in headers:
            with open(os.path.join(dst_dir, k), 'w') as output:
                csv_writers[k].writerow([header])
        else:
            headers[k]=1

        with open(os.path.join(dst_dir, k), mode='a', newline='') as output:
            writer = csv.writer(output)
            csv_writers[k] = writer
            csv_writers[k].writerow(row[0:1])

我更新了代码,现在收到“关键错误”我可能有什么问题?

以下是要拆分的文件示例:

<option value=''>Choose SubGroup</option>, ParentID
<option value='/1990-Accord-DX-Glass-s/37918.htm'>Glass</option>,Accord1990DX422F22A1BodyHardwareBackGlass
<option value='/1990-Accord-DX-Glass-s/37919.htm'>Glass</option>,Accord1990DX422F22A1BodyHardwareBackGlass
<option value='/1990-Accord-DX-Reveal-Moldings-s/69090.htm'>Reveal Moldings</option>,Accord1990DX422F22A1BodyHardwareBackGlass
<option value='/1990-Accord-DX-Reveal-Moldings-s/69091.htm'>Reveal Moldings</option>,Accord1990DX422F22A1BodyHardwareBackGlass
<option value='/1990-Accord-DX-Center-s/10331.htm'>Center</option>,Accord1990DX422F22A1BodyHardwareConsole
<option value='/1990-Accord-DX-Cowl-s/16006.htm'>Cowl</option>,Accord1990DX422F22A1BodyHardwareCowl
<option value='/1990-Accord-DX-Exterior-Trim-s/26889.htm'>Exterior Trim</option>,Accord1990DX422F22A1BodyHardwareFender
<option value='/1990-Accord-DX-Exterior-Trim-s/26890.htm'>Exterior Trim</option>,Accord1990DX422F22A1BodyHardwareFender

这里是错误: 关键错误:'Accord1990DX422F22A1BodyHardwareBackGlass.txt'

2 个答案:

答案 0 :(得分:1)

如果你想获得每一秒,你可能需要考虑使用类似的东西:

rows[::2]

在您的具体情况下,它应该足够了:

def split_csv_file(f, dst_dir, keyfunc):
    csv_reader = csv.reader(f)
    header = next(csv_reader)
    write_header = true
    csv_writers = {}
    for row in csv_reader:
        k = keyfunc(row)
        with open(os.path.join(dst_dir, k), mode='a', newline='') as output:
            writer = csv.writer(output)

            while write_header:
                writer.writerow(header)
                write_header = false

            csv_writers[k] = writer
            csv_writers[k].writerow(row[0:1])

答案 1 :(得分:1)

如果没有看到输入文件的结构,我想最简单的方法是检查k的第一次出现。如果它是第一次出现,那么您可以打开一个文件,编写标题,并且该(以及任何后续外观)可以附加到头文件中。因此,您的代码将如下所示:

headers={}
for row in csv_reader:
    k = keyfunc(row)
    if k in headers:
        with open(os.path.join(dst_dir, k), 'w') as output:
            writer.writerow(header)
    else:
        headers[k]=1
    with open(os.path.join(dst_dir, k), mode='a', newline='') as output:
        writer = csv.writer(output)
        csv_writers[k] = writer
        csv_writers[k].writerow(row[0:1])

通过使用'w'选项最初打开文件,您也不会意外地附加到旧版本,而是创建新的输出文件。否则,如果您运行问题,文件将永远附加 - 这可能是一个问题。