我知道这是一个常见的初学者问题,在堆栈交换中有很多这样的问题,我一直在搜索它们,但我仍然无法解决这个问题。我有一些scrape的数据看起来像这样(列表中约有1000个项目):
inputList = [[u'someplace', u'3901 West Millen Drive', u'Hobbs', u'NH',
u'88240', u'37.751117', u'-103.187709999'], [u'\u0100lon someplace', u'3120
S Las Vegas Blvd', u'Las Duman', u'AL', u'89109', u'36.129066', u'-145.168791']]
我试图将其写入csv文件,如下所示:
for i in inputList:
for ii in i:
ii.replace(" u'\u2019'", "") #just trying to get rid of offending character
ii.encode("utf-8")
def csvWrite(inList, outFile):
import csv
destination = open(outFile, 'w')
writer = csv.writer(destination, delimiter = ',')
data = inList
writer.writerows(data)
destination.close()
csvWrite(inputList, output)
但是我一直收到这个错误,writer.writerows(data):
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2019' in
position 5: ordinal not in range(128)
我尝试了一些不同的东西来编码列表中的数据,但仍然总是得到错误。我可以忽略那些无法编码为ascii的字符。任何人都可以指出我正确的方向,我使用的是python2.6
答案 0 :(得分:1)
这一行看起来很奇怪:ii.replace(" u'\u2019'", "")
,你的意思是ii.replace(u"\u2019", u"")
吗?
答案 1 :(得分:-1)
如果您只想删除这些不良字符,则可以使用此代码:
for i in inputList:
for ii in i:
ii = "".join(list( filter((lambda x: ord(x) < 128), ii)))
print ii
输出:
someplace
3901 West Millen Drive
Hobbs
NH
88240
37.751117
-103.187709999
lon someplace
3120 S Las Vegas Blvd
Las Duman
AL
89109
36.129066
-145.168791
最终代码如下所示:
inputList = [[u'someplace', u'3901 West Millen Drive', u'Hobbs', u'NH',
u'88240', u'37.751117', u'-103.187709999'], [u'\u0100lon someplace', u'3120 S Las Vegas Blvd', u'Las Duman', u'AL', u'89109', u'36.129066', u'-145.168791']]
cleared_inputList = []
for i in inputList:
c_i = []
for ii in i:
ii = "".join(list( filter((lambda x: ord(x) < 128), ii)))
c_i.append(ii)
cleared_inputList.append(c_i)
def csvWrite(inList, outFile):
import csv
destination = open(outFile, 'w')
writer = csv.writer(destination, delimiter = ',')
data = inList
writer.writerows(data)
destination.close()
csvWrite(cleared_inputList, output)