如何在python中将标题行复制到新的csv

时间:2016-12-28 21:49:09

标签: python python-2.7 csv

我似乎无法弄清楚如何将我的标题行从主文件复制到匹配...我需要抓住我的主csv中的第一行并先写入匹配的行,然后写下剩下的行他们符合标准......

with open('master.csv', 'r') as master, open('match.csv', 'w') as matched:
    for line in master:
            if any(city in line.split('","')[5] for city in citys) and \
            any(state in line.split('","')[6] for state in states) and \
            not any(category in line.split('","')[2] for category in categorys):
                matched.write(line)

请帮忙。我是python的新手,不知道如何使用熊猫或其他任何东西......

2 个答案:

答案 0 :(得分:2)

你可以只使用文件的第一行来读取并将其写回要写入的文件中:

with open('master.csv', 'r') as master, open('match.csv', 'w') as matched:
    matched.write(next(master)) # can't use readline when iterating on the file afterwards

似乎你确实需要csv模块,其余部分。我会编辑我的答案以尝试朝这个方向发展的事情

使用csv模块,不需要那些不安全的split。逗号是默认分隔符,引号也可以正确处理。所以我只想写:

import csv
with open('master.csv', 'r') as master, open('match.csv', 'w') as matched:
    cr = csv.reader(master)
    cw = csv.writer(matched)
    cw.writerow(next(cr))  # copy title

    for row in cr:  # iterate on the rows, already organized as lists
        if any(city in row[5] for city in citys) and \
        any(state in row[6] for state in states) and \
        not any(category in row[2] for category in categorys):
            cw.writerow(row)

BTW您的过滤器会检查city中是否包含row[5],但您可能希望完全匹配。例如:"York"会匹配"New York",这可能不是您想要的。所以我的建议是使用in检查字符串是否在字符串列表中,对于每个标准:

import csv
with open('master.csv', 'r') as master, open('match.csv', 'w') as matched:
    cr = csv.reader(master)
    cw = csv.writer(matched)
    cw.writerow(next(cr))  # copy title
    for row in cr:
        if row[5] in citys and row[6] in states and not row[2] in categorys:
           cw.writerow(row)

甚至可以使用生成器理解更好地编写所有行:

import csv
with open('master.csv', 'r') as master, open('match.csv', 'w') as matched:
    cr = csv.reader(master)
    cw = csv.writer(matched)
    cw.writerow(next(cr))  # copy title
    cw.writerows(row for row in cr if row[5] in citys and row[6] in states and not row[2] in categorys)

请注意,citysstatescategorys会更好set而不是list s,因此查找算法要快得多(您没有不提供这些信息)

答案 1 :(得分:0)

如果你不想太认真思考生产线迭代器是如何工作的,那么直接的方法是将第一行视为特殊:

with open('master.csv', 'r') as master, open('match.csv', 'w') as matched:
    first_line = True
    for line in master:
            if first_line or (any(city in line.split('","')[5] for city in citys) and \
            any(state in line.split('","')[6] for state in states) and \
            not any(category in line.split('","')[2] for category in categorys)):
                matched.write(line)
            first_line = False