将OrderedDict输出为CSV

时间:2015-07-24 14:32:30

标签: python csv

我读了一个CSV文件并使用usaddress库来解析一个地址字段。如何将生成的OrderedDicts写入另一个CSV文件?

import usaddress
import csv

with open('output.csv') as csvfile:
    reader = csv.DictReader(csvfile)
    for row in reader:
        addr=row['Case Parties Address']
        data = usaddress.tag(addr)
        print(data)
(OrderedDict([('AddressNumber', u'4167'), ('StreetNamePreType', u'Highway'), ('StreetName', u'319'), ('StreetNamePostDirectional', u'E'), ('PlaceName', u'Conway'), ('StateName', u'SC'), ('ZipCode', u'29526-5446')]), 'Street Address')

3 个答案:

答案 0 :(得分:1)

请参阅this github issue了解解决方案:

import csvkit
import usaddress

# expected format in input.csv: first column 'id', second column 'address'
with open('input.csv', 'rU') as f:
    reader = csvkit.DictReader(f)

    all_rows = []
    for row in reader:
        try:
            parsed_addr = usaddress.tag(row['address'])
            row_dict = parsed_addr[0]
        except:
            row_dict = {'error':'True'}

        row_dict['id'] = row['id']
        all_rows.append(row_dict)

field_list = ['id','AddressNumber', 'AddressNumberPrefix', 'AddressNumberSuffix', 'BuildingName', 
              'CornerOf','IntersectionSeparator','LandmarkName','NotAddress','OccupancyType',
              'OccupancyIdentifier','PlaceName','Recipient','StateName','StreetName',
              'StreetNamePreDirectional','StreetNamePreModifier','StreetNamePreType',
              'StreetNamePostDirectional','StreetNamePostModifier','StreetNamePostType',
              'SubaddressIdentifier','SubaddressType','USPSBoxGroupID','USPSBoxGroupType',
              'USPSBoxID','USPSBoxType','ZipCode', 'error']

with open('output.csv', 'wb') as outfile:
    writer = csvkit.DictWriter(outfile, field_list)
    writer.writeheader()
    writer.writerows(all_rows)

一些注意事项:

  • 因为每个标记的地址可以有一组不同的键,所以您应该使用所有可能的键定义输出中的列。这不是问题,因为我们知道所有可能的美国地址标签
  • 如果usaddress标记方法无法以直观的方式连接地址标记,则会引发错误。这些错误应该在输出中捕获

答案 1 :(得分:0)

在不知道usaddress模块的情况下,data在这种情况下似乎是dict,因此当您打印dict时,它会打印每个key: value for对。我猜你想在下面的解决方案中使用密钥作为标题,以及每行数据的值。

以下是使用您发布的代码片段和一些编辑的建议。在这种情况下,您会为with open('output.csv') as csvfile: reader = csv.DictReader(csvfile) with open('myoutputfile', 'w') as o: # this will be the new file you write to for row in reader: addr=row['Case Parties Address'] data = usaddress.tag(addr) header = ','.join(data.keys()) + '\n' # this will make a string of the header separated by comma with a newline at the end data_string = ','.join(data.values()) + '\n' # this will make a string of the values separated by comma with a newline at the end o.write(header + data_string) # this will write the header and then the data on a new line with each field separated by commas 循环的每次迭代获得一个新的标题和一个新的数据行,这就是您没有进一步信息的情况:

for

希望这会有所帮助。如果您尝试为figure循环的每次迭代编写单个标头然后写入数据行,那么它看起来会有点不同......

答案 2 :(得分:0)

以下内容应该有效。它假定每个地址条目包含相同的字段。第一个条目用于自动创建标题。

import usaddress
import csv

with open('output.csv', 'r') as f_input, open('case_parties.csv', 'wb') as f_output:
    csv_input = csv.DictReader(f_input)
    csv_output = csv.writer(f_output)
    write_headers = True

    for row in csv_input:
        addr=row['Case Parties Address']
        data = usaddress.tag(addr)

        if write_headers:
            csv_output.writerow(data[0].keys())
            write_headers = False

        csv_output.writerow(data[0].values())