我要解析为CSV文件的文件夹中有大量XML文件。我的代码如下:
import xml.etree.ElementTree as ET
import csv
import os
fields = [
('ID', 'FHRSID'),
('businessname', 'BusinessName'),
('businesstype', 'BusinessType'),
('address1', 'AddressLine1'),
('address2', 'AddressLine2'),
('address3', 'AddressLine3'),
('address4', 'AddressLine4'),
('postcode', 'PostCode'),
('longitude', 'Geocode/Longitude'),
('latitude', 'Geocode/Latitude')]
path = '/***/****/****/XML'
for filename in os.listdir(path):
if not filename.endswith('.xml'): continue
fullname = os.path.join(path, filename)
tree = ET.parse(fullname)
with open(r'outputdata.csv', 'wb') as f_businesslist:
csv_businessdata = csv.DictWriter(f_businesslist, fieldnames=[field for field, match in fields])
csv_businessdata.writeheader()
for node in tree.iter('EstablishmentDetail'):
row = {}
for field_name, match in fields:
try:
row[field_name] = node.find(match).text
except AttributeError as e:
row[field_name] = ''
csv_businessdata.writerow(row)
它做了应该做的事,但随后出现如下编码错误:
Traceback (most recent call last):
File "./XMLtoCsv.py", line 42, in <module>
csv_businessdata.writerow(row)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/csv.py", line 152, in writerow
return self.writer.writerow(self._dict_to_list(rowdict))
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 11: ordinal not in range(128)
有人可以帮忙吗?我花了很多时间阅读一些类似的问题,但似乎无济于事。我对此很陌生,所以我认为这是我做过的或做不到的愚蠢的事情。非常感谢
答案 0 :(得分:0)
打开文件时,您需要显式编码Unicode。
with open(r'outputdata.csv', 'wb', encoding='utf-8) as f_businesslist:
似乎您正在使用python 2.7。我也建议切换到python 3.x。