经过一些迭代后,从kickstarter抓取数据的Python代码无法正常工作

时间:2019-01-04 00:34:32

标签: python json web-scraping beautifulsoup kickstarter

我尝试从kickstarter抓取数据,该代码正在运行,但是在第15页中出现以下错误(由于网页是动态的,您可能会在其他页面中遇到错误):

  

回溯(最近一次通话最近):文件“ C:\ Users \ lenovo \ kick.py”,   第30行,在       csvwriter.writerow(row)文件“ C:\ Users \ lenovo \ AppData \ Local \ Programs \ Python \ Python37 \ lib \ encodings \ cp1252.py”,   第19行,编码       返回codecs.charmap_encode(input,self.errors,encoding_table)[0] UnicodeEncodeError:'charmap'编解码器无法对字符'\ uff5c'进行编码   位置27:字符映射到

可能是什么问题?有什么建议吗?

from urllib.request import urlopen
from bs4 import BeautifulSoup
import json
import csv
KICKSTARTER_SEARCH_URL = "https://www.kickstarter.com/discover/advanced?category_id=16&sort=newest&seed=2502593&page={}"
DATA_FILE = "kickstarter.csv"
csvfile = open(DATA_FILE, 'w')
csvwriter = csv.writer(csvfile, delimiter=',')
page_start = 0
while True:
    url = KICKSTARTER_SEARCH_URL.format(page_start)
    print(url)
    response = urlopen(url)
    html = response.read()
    soup = BeautifulSoup(html, 'html.parser')
    project_details_divs = soup.findAll('div', {"class":"js-react-proj-card"})

    if len(project_details_divs) == 0:
        break;

    for div in project_details_divs:
        project = json.loads(div['data-project'])
        row = [project["id"],project["name"],project["goal"],project["pledged"]]
        csvwriter.writerow(row)

    page_start +=1

csvfile.close()

1 个答案:

答案 0 :(得分:0)

将参数encoding添加到文件打开器中。我的意思是,改变

csvfile = open(DATA_FILE, 'w')

进入

csvfile = open(DATA_FILE, 'w', encoding='utf-8')