目前我正在使用Python 2.7编写一个可以正常工作的脚本,除非在运行它几秒后它会遇到错误:
Enter Shopify website URL (without HTTP): store.highsnobiety.com
Scraping! Check log file @ z:\shopify_output.txt to see output.
!!! Also make sure to clear file every hour or so !!!
Copper Bracelet - 3mm - Polished ['3723603267']
Traceback (most recent call last):
File "shopify_sitemap_scraper.py", line 38, in <module>
print(prod, variants).encode('utf-8')
AttributeError: 'NoneType' object has no attribute 'encode'
该脚本是从Shopify网站获取数据,然后将其打印到控制台。代码在这里:
# -*- coding: utf-8 -*-
from __future__ import print_function
from lxml.html import fromstring
import requests
import time
import sys
reload(sys)
sys.setdefaultencoding('utf-8')
# Log file location, change "z://shopify_output.txt" to your location.
logFileLocation = "z:\shopify_output.txt"
log = open(logFileLocation, "w")
# URL of Shopify website from user input (for testing, just use store.highsnobiety.com during input)
url = 'http://' + raw_input("Enter Shopify website URL (without HTTP): ") + '/sitemap_products_1.xml'
print ('Scraping! Check log file @ ' + logFileLocation + ' to see output.')
print ("!!! Also make sure to clear file every hour or so !!!")
while True :
page = requests.get(url)
tree = fromstring(page.content)
# skip first url tag with no image:title
url_tags = tree.xpath("//url[position() > 1]")
data = [(e.xpath("./image/title//text()")[0],e.xpath("./loc/text()")[0]) for e in url_tags]
for prod, url in data:
# add xml extension to url
page = requests.get(url + ".xml")
tree = fromstring(page.content)
variants = tree.xpath("//variants[@type='array']//id[@type='integer']//text()")
print(prod, variants).encode('utf-8')
关于它的最疯狂的部分是,当我取出.encode('utf-8')
时,它会给我一个UnicodeEncodeError,如下所示:
Enter Shopify website URL (without HTTP): store.highsnobiety.com
Scraping! Check log file @ z:\shopify_output.txt to see output.
!!! Also make sure to clear file every hour or so !!!
Copper Bracelet - 3mm - Polished ['3723603267']
Copper Bracelet - 5mm - Brushed ['3726247811']
Copper Bracelet - 7mm - Polished ['3726253635']
Highsnobiety x EARLY - Leather Pouch ['14541472963', '14541473027', '14541473091']
Traceback (most recent call last):
File "shopify_sitemap_scraper.py", line 38, in <module>
print(prod, variants)
File "C:\Python27\lib\encodings\cp437.py", line 12, in encode
return codecs.charmap_encode(input,errors,encoding_map)
UnicodeEncodeError: 'charmap' codec can't encode character u'\xae' in position 13: character maps to <undefined>'
有什么想法吗?不知道经过数小时的谷歌搜索还能尝试什么。
答案 0 :(得分:1)
您的控制台的默认编码为cp437,而cp437无法表示字符u'\xae'
。
>>> print (u'\xae')
®
>>> print (u'\xae'.encode('utf-8'))
b'\xc2\xae'
>>> print (u'\xae'.encode('cp437'))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python3.5/encodings/cp437.py", line 12, in encode
return codecs.charmap_encode(input,errors,encoding_map)
UnicodeEncodeError: 'charmap' codec can't encode character '\xae' in position 0: character maps to <undefined>
你可以看到它试图在追溯中转换为cp437:
File "C:\Python27\lib\encodings\cp437.py", line 12, in encode
(我在Python3.5中重现了这个问题,但它在两个版本的Python中都是同样的问题)
答案 1 :(得分:1)
snakecharmerb 几乎得到了它,但错过了第一次错误的原因。你的代码
print(prod, variants).encode('utf-8')
表示您print
prod
和variants
变量的值,然后尝试在encode()
的输出上运行print
函数。不幸的是,print()
(作为Python 2中的函数,始终在Python 3中)返回None
。要解决此问题,请改用以下内容:
print(prod.encode("utf-8"), variants)