我读了一个CSV文件并使用usaddress库来解析一个地址字段。如何将生成的OrderedDicts写入另一个CSV文件?
import usaddress
import csv
with open('output.csv') as csvfile:
reader = csv.DictReader(csvfile)
for row in reader:
addr=row['Case Parties Address']
data = usaddress.tag(addr)
print(data)
(OrderedDict([('AddressNumber', u'4167'), ('StreetNamePreType', u'Highway'), ('StreetName', u'319'), ('StreetNamePostDirectional', u'E'), ('PlaceName', u'Conway'), ('StateName', u'SC'), ('ZipCode', u'29526-5446')]), 'Street Address')
答案 0 :(得分:1)
请参阅this github issue了解解决方案:
import csvkit
import usaddress
# expected format in input.csv: first column 'id', second column 'address'
with open('input.csv', 'rU') as f:
reader = csvkit.DictReader(f)
all_rows = []
for row in reader:
try:
parsed_addr = usaddress.tag(row['address'])
row_dict = parsed_addr[0]
except:
row_dict = {'error':'True'}
row_dict['id'] = row['id']
all_rows.append(row_dict)
field_list = ['id','AddressNumber', 'AddressNumberPrefix', 'AddressNumberSuffix', 'BuildingName',
'CornerOf','IntersectionSeparator','LandmarkName','NotAddress','OccupancyType',
'OccupancyIdentifier','PlaceName','Recipient','StateName','StreetName',
'StreetNamePreDirectional','StreetNamePreModifier','StreetNamePreType',
'StreetNamePostDirectional','StreetNamePostModifier','StreetNamePostType',
'SubaddressIdentifier','SubaddressType','USPSBoxGroupID','USPSBoxGroupType',
'USPSBoxID','USPSBoxType','ZipCode', 'error']
with open('output.csv', 'wb') as outfile:
writer = csvkit.DictWriter(outfile, field_list)
writer.writeheader()
writer.writerows(all_rows)
一些注意事项:
答案 1 :(得分:0)
在不知道usaddress
模块的情况下,data
在这种情况下似乎是dict
,因此当您打印dict
时,它会打印每个key: value
for
对。我猜你想在下面的解决方案中使用密钥作为标题,以及每行数据的值。
以下是使用您发布的代码片段和一些编辑的建议。在这种情况下,您会为with open('output.csv') as csvfile:
reader = csv.DictReader(csvfile)
with open('myoutputfile', 'w') as o: # this will be the new file you write to
for row in reader:
addr=row['Case Parties Address']
data = usaddress.tag(addr)
header = ','.join(data.keys()) + '\n' # this will make a string of the header separated by comma with a newline at the end
data_string = ','.join(data.values()) + '\n' # this will make a string of the values separated by comma with a newline at the end
o.write(header + data_string) # this will write the header and then the data on a new line with each field separated by commas
循环的每次迭代获得一个新的标题和一个新的数据行,这就是您没有进一步信息的情况:
for
希望这会有所帮助。如果您尝试为figure
循环的每次迭代编写单个标头然后写入数据行,那么它看起来会有点不同......
答案 2 :(得分:0)
以下内容应该有效。它假定每个地址条目包含相同的字段。第一个条目用于自动创建标题。
import usaddress
import csv
with open('output.csv', 'r') as f_input, open('case_parties.csv', 'wb') as f_output:
csv_input = csv.DictReader(f_input)
csv_output = csv.writer(f_output)
write_headers = True
for row in csv_input:
addr=row['Case Parties Address']
data = usaddress.tag(addr)
if write_headers:
csv_output.writerow(data[0].keys())
write_headers = False
csv_output.writerow(data[0].values())