尝试抓取一个网站(俄语,西里尔语)并保存csv中的所有内容,但收到错误
追踪(最近一次通话): 文件“/Users/kr/PycharmProjects/education_py/credit_parser.py”,第30行,在 base64.b64decode(listing_title [0] .encode( 'UTF-8')), 在b64decode中输入文件“/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/base64.py”,第76行 提出TypeError(msg) TypeError:填充不正确
我的代码
# coding: utf8
import requests
from lxml.html import fromstring
import csv
import base64
headers = {
'User-Agent': "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_5) AppleWebKit/601.6.17 (KHTML, like Gecko) Version/9.1.1 Safari/601.6.17"
}
csvfile = open('credit-listing.csv', 'wb')
writer = csv.writer(csvfile, quotechar='|', quoting=csv.QUOTE_ALL)
i = 1
while i < 2:
url = requests.get("http://credit-board.ru/index.php?page=search&sCategory=116&iPage={}".format(i), headers=headers)
page_html = fromstring(url.content)
all_listings = page_html.xpath('//*[@id="listing-card-list"]/li')
listings_list = []
for listing in all_listings:
listing_urls = listing.xpath('./div/div/div/div/a/@href')[0]
listing_request = requests.get(listing_urls)
listing_html = fromstring(listing_request.content)
listing_title = listing_html.xpath('//*[@id="item-content"]/h1/strong/text()')
listing_text = listing_html.xpath('//*[@id="description"]/p[1]/text()')
listing_meta = listing_html.xpath('//*[@id="custom_fields"]/div/div/text()')
listings_list.append([listing_title, listing_text, listing_meta])
writer.writerow([
base64.b64decode(listing_title[0].encode('utf-8')),
base64.b64decode(listing_text[0].encode('utf-8')),
base64.b64decode(listing_meta[0].encode('utf-8'))
])
i+=1
print i
答案 0 :(得分:2)
You should use b64encode
instead of b64decode
.
答案 1 :(得分:0)
像这样:
try:
if divmod(len(field),4)[1] != 0:
field += "="*(4-divmod(len(field),4)[1])
#decode field here
except Exception,e: print e
Field = base64编码项目。
加密不是base64.encoding
,在解码前先编码,从不解码其他工具(field.encode('utf-8')错误)所有base64编码项都有url_safe字符模式。