我一直试图在别处找到答案,但要么我不理解解释,要么解决方案对我的情况不起作用。
因此对于这种情况:
1.输出字符是中文
2.阅读部分工作完全正常,只是书写故障
3.我正在使用Python 2.7.13
请帮忙!
谢谢!
以下是代码:
# -*- coding: utf-8 -*-
import csv
import urllib2
from bs4 import BeautifulSoup
import socket
import httplib
# import sys <= this did not work
# reload(sys)
# sys.setdefaultencoding('utf-8')
with open('/users/Rachael/Desktop/BDnodes.csv', 'r') as readcsv,
open("/users/Rachael/Desktop/CheckTitle.csv", 'wb') as writecsv:
writer = csv.writer(writecsv)
for row in readcsv.readlines():
opener = urllib2.build_opener()
opener.addheaders = [('User-Agent',
'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/36.0.1941.0 Safari/537.36')]
urllib2.install_opener(opener)
openpage = urllib2.urlopen(row).read()
soup = BeautifulSoup(openpage, "lxml")
# print "page results:"
for child in soup.findAll("h3", {"class": "t"}):
try:
geturls = child.a.get('href')
# print urllib2.urlopen(geturls).geturl()
url_result = urllib2.urlopen(geturls).geturl()
# print url_result
try:
openitem = urllib2.urlopen(url_result).read()
gettitle = BeautifulSoup(openitem, 'lxml')
url_title = gettitle.title.text
except urllib2.HTTPError:
url_title = 'passed http error'
pass
except urllib2.URLError:
url_title = 'passed url error'
pass
except socket.timeout:
url_title = 'passed timeout'
pass
except httplib.BadStatusLine:
url_title = 'passed badstatus'
pass
except:
url_title = 'unknown'
pass
except urllib2.HTTPError as e:
pass
except urllib2.URLError:
pass
except socket.timeout:
pass
except httplib.BadStatusLine:
pass
writer.writerow([url_result, url_title])
# writer.writerow([url_result, url_title.encode('utf-8')]) did not work either, even tried with 'utf-16'
writecsv.close()
错误是:
C:\Python27\python.exe C:/Users/Rachael/PycharmProjects/untitled1/OpenNGet.py
Traceback (most recent call last):
File "C:/Users/Rachael/PycharmProjects/untitled1/OpenNGet.py", line 55, in <module>
writer.writerow([url_result, url_title])
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-2: ordinal not in range(128)
Process finished with exit code 1
答案 0 :(得分:0)
您可以在open函数中传递编码参数。
import codecs
codecs.open("/users/Rachael/Desktop/CheckTitle.csv", 'wb', encoding='utf-8') as writecsv
答案 1 :(得分:0)
您的原始解决方案是否正确,但问题出在“结果”变量而不是标题中?
尝试类似
的内容writer.writerow([url_result.encode('utf-8'), url_title.encode('utf-8')])