Question

我在网上报废不同的网页，每个网页都在写csv文件的每一行

import csv
fieldnames=["Title", "Author", "year"]
counter=1
for webpage of webpages:
    if counter==1:
        f = open('file.csv', 'wb')  
        my_writer = csv.DictWriter(f, fieldnames)
        my_writer.writeheader()
        f.close()

    something where I get the information (title, author and year) for each webpage

    variables={ele:"NA" for ele in fieldnames}
    variables['Title']=title        
    variables['Author']=author
    variables['year']=year


    with open('file.csv', 'a+b') as f:
    header = next(csv.reader(f))
    dict_writer = csv.DictWriter(f, header)
    dict_writer.writerow(variables) 
    counter+=1

然而，可能会有多个作者（因此网页报废后的作者实际上是一个列表）所以我想在csv文件的标题中有：author1，author2，author3等。但是我不知道知道作者的最大数量是多少。所以在循环中我想编辑标题并开始添加author2，author3等，这取决于是否需要该行来创建更多作者。

Answer 1

可能是这样的：

def write_to_csv(file_name, records, fieldnames=None):

    import csv
    from datetime import datetime

    with open('/tmp/' + file_name, 'w') as csvfile:
        if not fieldnames:
            fieldnames = records[0].keys()
        writer = csv.DictWriter(csvfile, fieldnames=fieldnames,   extrasaction='ignore')
        writer.writeheader()
        for row in records:
            writer.writerow(row)

def scrap():
    for webpage of webpages:
        webpage_data = [{'title':'','author1':'foo','author2':'bar'}] #sample data
        write_to_csv(webpage[0].title+'csv', webpage_data,webpage_data[0].keys())

我假设：

同一网页的数据将保持一致，但循环中的下一个网页会有所不同
网页数据是字典列表，其值已映射到键
上面的代码基于Python 3

所以在循环中，我们只是获取数据，并将相关的字段名和值传递给另一个函数，因此能够将其写入csv。

Answer 2

因为＆＃34;作者＆＃34;是一个可变长度列表，您应该以某种方式序列化它以适合单个字段。例如，使用分号作为分隔符。

假设您的authors对象中包含所有作者的webpage字段，您可能希望将分配行更改为以下内容：

variables['Authors']=';'.join(webpage.authors)

这是所有作者的简单序列化。你当然可以拿出别的东西 - 使用不同的分隔符或序列化为JSON或YAML或类似的东西。

希望能给出一些想法。

使用python编写csv时创建新标头

2 个答案: