从URL结果python 2.7编码外语

时间:2017-03-03 17:54:22

标签: python python-2.7

我有一个非常基本的python脚本,它从搜索的文本文件中提取并返回来自Google的第一个URL。当谷歌搜索结果包含外国字符(例如蒙特利尔)时,我收到错误

理想情况下,无论语言如何,我都希望包含任何字符

import requests                   
from bs4 import BeautifulSoup

with open("searches.txt") as input:  # look at each line in our input file
    content = input.readlines()
content = [x.strip() for x in content]  # and strip of newline characters

print '---'  # some formatting so it looks nice in terminal and our output file
header = '<Query>, <Link>' + '\n' + '---------------' + '\n' 
output = open("links.txt", "w")  # open file we want to write to                                 
output.write(header)                                            

for x in content:  # for each line in our input file
    print x
    query = x  # search google for that query
    goog_search = "https://www.google.co.uk/search?sclient=psy-ab&client=ubuntu&hs=k5b&channel=fs&biw=1366&bih=648&noj=1&q=" + query
    r = requests.get(goog_search)                                                                                                           
    soup = BeautifulSoup(r.text, "html.parser")  # parse so we just get the link
    link = soup.find('cite').text
    formatted = query + ', ' + link + '\n'  # more output formatting
    print query + ', ' + link
    output.write(formatted)

output.close()
print '---'

我收到的错误: UnicodeEncodeError:'ascii'编解码器无法编码位置53中的字符u'\ xe9':序数不在范围内(128)

0 个答案:

没有答案