调用数据库更新时,获取sqlite3.OperationalError:无法识别的令牌:“:”

时间:2019-06-15 21:24:58

标签: python-3.x sqlite scrapy

尝试更新管道文件set_data_update函数中db中的列时出现错误。 我正在尝试使用get_data函数返回url和价格,对于返回的每个URL,请调用set_data_update函数,在其中我将现有的new_price交换为old_price,然后放入 新的报废价格变为new_price。似乎我对get_data中的set_data_update的调用总是运行两次。它应该运行一次,因为目前我在第二个URL的数据库中只有一行- “ https://www.amazon.com/Hamilton-Beach-46310-Programmable-Coffee/dp/B07684BPLB/ref=sr_1_10?keywords=coffee+maker&qid=1559098604&s=home-garden&sr=1-10”。

我也看到追溯错误

  

sqlite3.OperationalError:无法识别的令牌:“:”

products.json

{
"itemdata": [ 
    {   "url": "https://www.amazon.com/dp/B07GWKT87L/?`coliid=I36XKNB8MLE3&colid=KRASGH7290D0&psc=0&ref_=lv_ov_lig_dp_it#customerReview",`
        "title": "coffee_maker_black_and_decker",
        "name": "Cobi Maguire",
        "email": "cobi@noemail.com"
    },
    {   "url": "https://www.amazon.com/Hamilton-Beach-46310-Programmable-Coffee/dp/B07684BPLB/ref=sr_1_10?keywords=coffee+maker&qid=1559098604&s=home-garden&sr=1-10",
        "title": "coffee_maker_hamilton_beach",
        "name": "Ryan Murphy",
        "email": "ryan@noemail.com"
    }
    ]
}
  

错误回溯-回溯(最近一次通话为最后一次):     (price_monitor)C:\ Users \ hassy \ Documents \ python_venv \ price_monitor \ price_monitor>抓取抓取price_monitor       2019-06-15 17:00:10 [scrapy.utils.log]信息:Scrapy 1.6.0已启动(bot:price_monitor)       2019-06-15 17:00:10 [scrapy.utils.log]信息:版本:lxml 4.3.3.0,libxml2 2.9.5,cssselect 1.0.3,parsel 1.5.1,w3lib 1.20.0,Twisted 19.2.0 ,Python 3.6.5(v3.6.5:f59c0932b4,Mar 28 2018,16:07:46)[MSC v.1900 32位(Intel)],pyOpenSSL 19.0.0(OpenSSL 1.1.1b 2019年2月26日),密码学2.6 .1,平台Windows-10-10.0.17134-SP0       2019-06-15 17:00:10 [scrapy.crawler]信息:覆盖的设置:{'BOT_NAME':'price_monitor','NEWSPIDER_MODULE':'price_monitor.spiders','ROBOTSTXT_OBEY':是,'SPIDER_MODULES':[ 'price_monitor.spiders'],'USER_AGENT':'用户代理:Mozilla / 5.0(Macintosh; Intel Mac OS X 10_13_6)AppleWebKit / 537.36(KHTML,例如Gecko)Chrome / 69.0.3497.100 Safari / 537.36'}       2019-06-15 17:00:10 [scrapy.extensions.telnet]信息:Telnet密码:3c0578dfed20521c       2019-06-15 17:00:10 [scrapy.middleware]信息:启用的扩展程序:       ['scrapy.extensions.corestats.CoreStats',        'scrapy.extensions.telnet.TelnetConsole',        'scrapy.extensions.logstats.LogStats']       2019-06-15 17:00:10 [scrapy.middleware]信息:启用下载器中间件:       ['scrapy.downloadermiddlewares.robotstxt.RobotsTxtMiddleware',        'scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',        “ scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware”,        'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',        'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',        'scrapy.downloadermiddlewares.retry.RetryMiddleware',        'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',        “ scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware”,        'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',        'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',        'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',        'scrapy.downloadermiddlewares.stats.DownloaderStats']       2019-06-15 17:00:10 [scrapy.middleware]信息:已启用蜘蛛中间件:       ['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',        'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',        'scrapy.spidermiddlewares.referer.RefererMiddleware',        'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',        'scrapy.spidermiddlewares.depth.DepthMiddleware']       2019-06-15 17:00:10 [scrapy.middleware]信息:启用的项目管道:       ['price_monitor.pipelines.PriceMonitorPipeline']       2019-06-15 17:00:10 [scrapy.core.engine]信息:蜘蛛开了       2019-06-15 17:00:10 [scrapy.extensions.logstats]信息:抓取0页(以0页/分钟),抓取0件(以0件/分钟)       2019-06-15 17:00:10 [scrapy.extensions.telnet]信息:Telnet控制台正在侦听127.0.0.1:6023       2019-06-15 17:00:11 [scrapy.core.engine]调试:爬行(200)https://www.amazon.com/robots.txt>(引用:无)       2019-06-15 17:00:11 [scrapy.downloadermiddlewares.redirect]调试:从https重定向(301)到https://www.amazon.com/BLACK-DECKER-CM4202S-Programmable-Coffeemaker/dp/B07GWKT87L> ://www.amazon.com/dp/B07GWKT87L/?coliid = I36XKNB8MLE3&colid = KRASGH7290D0&psc = 0&ref_ = lv_ov_lig_dp_it#customerReview>       2019-06-15 17:00:11 [scrapy.downloadermiddlewares.redirect]调试:从https重定向(301)到https://www.amazon.com/Hamilton-Beach-46310-Programmable-Coffee/dp/B07684BPLB> ://www.amazon.com/Hamilton-Beach-46310-Programmable-Coffee/dp/B07684BPLB/ref=sr_1_10?keywords = coffee + maker&qid = 1559098604&s = home-garden&sr = 1-10>       2019-06-15 17:00:12 [scrapy.core.engine]调试:爬网(200)https://www.amazon.com/BLACK-DECKER-CM4202S-Programmable-Coffeemaker/dp/B07GWKT87L>(参考:没有)       2019-06-15 17:00:12 [scrapy.core.engine]调试:爬行(200)https://www.amazon.com/Hamilton-Beach-46310-Programmable-Coffee/dp/B07684BPLB>(参考:没有)       打印行       ('https://www.amazon.com/Hamilton-Beach-46310-Programmable-Coffee/dp/B07684BPLB/ref=sr_1_10?keywords=coffee+maker&qid=1559098604&s=home-garden&sr=1-10','$ 37.99')       呼叫功能       2019-06-15 17:00:12 [scrapy.core.scraper]错误:错误处理{'email':'ryan@noemail.com',        'name':'Ryan Murphy',        '价格':'$ 49.99',        'title':'BLACK + DECKER CM4202S可选A尺寸简易拨号'                 “咖啡壶,超大容量80盎司,不锈钢”,        'url':'h'}       追溯(最近一次通话):         _runCallbacks中的第654行“ c:\ users \ hassy \ documents \ python_venv \ price_monitor \ lib \ site-packages \ twisted \ internet \ defer.py”           current.result =回调(current.result,* args,** kw)         文件“ c:\ users \ hassy \ documents \ python_venv \ price_monitor \ price_monitor \ pipelines.py”,第37行,在process_item中           self.get_data(item)         get_data中的第60行的文件“ c:\ users \ hassy \ documents \ python_venv \ price_monitor \ price_monitor \ pipelines.py”           self.set_data_update(项目,网址,新价格)         set_data_update中第88行的文件“ c:\ users \ hassy \ documents \ python_venv \ price_monitor \ price_monitor \ pipelines.py”           {'old_price':old_price,'new_price':item ['price']})       sqlite3.OperationalError:无法识别的令牌:“:”       打印行       ('https://www.amazon.com/Hamilton-Beach-46310-Programmable-Coffee/dp/B07684BPLB/ref=sr_1_10?keywords=coffee+maker&qid=1559098604&s=home-garden&sr=1-10','$ 37.99')       呼叫功能       2019-06-15 17:00:12 [scrapy.core.scraper]错误:错误处理{'email':'ryan@noemail.com',        'name':'Ryan Murphy',        '价格':'$ 34.99',        'title':'Hamilton Beach 46310可编程咖啡机,12杯,黑色,'        'url':'h'}       追溯(最近一次通话):         _runCallbacks中的第654行“ c:\ users \ hassy \ documents \ python_venv \ price_monitor \ lib \ site-packages \ twisted \ internet \ defer.py”           current.result =回调(current.result,* args,** kw)         文件“ c:\ users \ hassy \ documents \ python_venv \ price_monitor \ price_monitor \ pipelines.py”,第37行,在process_item中           self.get_data(item)         get_data中的第60行的文件“ c:\ users \ hassy \ documents \ python_venv \ price_monitor \ price_monitor \ pipelines.py”           self.set_data_update(项目,网址,新价格)         set_data_update中第88行的文件“ c:\ users \ hassy \ documents \ python_venv \ price_monitor \ price_monitor \ pipelines.py”           {'old_price':old_price,'new_price':item ['price']})       sqlite3.OperationalError:无法识别的令牌:“:”       2019-06-15 17:00:12 [scrapy.core.engine]信息:关闭蜘蛛(已完成)       2019-06-15 17:00:12 [scrapy.statscollectors]信息:倾销Scrapy统计信息:       {'downloader / request_bytes':1888,        “ downloader / request_count”:5        'downloader / request_method_count / GET':5,        'downloader / response_bytes':261495,        'downloader / response_count':5        “ downloader / response_status_count / 200”:3,        'downloader / response_status_count / 301':2        'finish_reason':'完成',        'finish_time':datetime.datetime(2019,6,15,15,21,0,12,534906),        'log_count / DEBUG':5        'log_count / ERROR':2        'log_count / INFO':9,        'response_received_count':3,        'robotstxt / request_count':1,        'robotstxt / response_count':1,        'robotstxt / response_status_count / 200':1,        “调度程序/出队”:4        “调度程序/出队/内存”:4        “调度程序/排队”:4        “调度程序/排队/内存”:4        'start_time':datetime.datetime(2019、6、15、21、0、10、799145)}       2019-06-15 17:00:12 [scrapy.core.engine]信息:蜘蛛关闭了(完成)

(price_monitor) C:\Users\hassy\Documents\python_venv\price_monitor\price_monitor>

pipelines.py

import sqlite3


class PriceMonitorPipeline(object):

    def __init__(self):
        self.create_connection()
        self.create_table()

    def create_connection(self):
        self.conn = sqlite3.connect("price_monitor.db")
        self.curr = self.conn.cursor()

    def process_item(self, item, spider):
#        self.store_data(item)
        print("printing items")
        print(item['title'])
        print(item['price'])
        self.get_data(item)
        return item

    def get_data(self, item):
        """ Check if the row already exists for this url """
        rows = 0
        url = ''
        new_price = ''
        self.rows = rows
        self.url = url
        self.new_price = new_price


        self.curr.execute("""select url, new_price from price_monitor WHERE url =:url""",
                          {'url': item['url']})

        rows = self.curr.fetchone()
        print("Printing rows")
        print(rows)
        rows_url = rows[0]
        new_price = rows[1]

    if rows is not None:
        for item['url'] in rows_url:
            print("calling func")
            self.set_data_update(item, url, new_price)
    else:
        pass

    def set_data_update(self, item, url, new_price):

        url = 'https://www.amazon.com/Hamilton-Beach-46310-Programmable-Coffee/dp/B07684BPLB/ref=sr_1_10?keywords=coffee+maker&qid=1559098604&s=home-garden&sr=1-10'
        old_price = new_price
        price = item['price']
        print("printing old price")
        print(old_price)
        print("New Price".format(item['price']))
        self.curr.execute("""update price_monitor SET old_price=: old_price, new_price=: new_price
                              WHERE url=: url""",
                          {'old_price': old_price, 'new_price': price})

        self.conn.commit()

items.py

import scrapy


class AmazonItem(scrapy.Item):
    # define the fields for your item here like:
    # name = scrapy.Field()
    url = scrapy.Field()
    title = scrapy.Field()
    price = scrapy.Field()
    name = scrapy.Field()
    email = scrapy.Field()

蜘蛛

import scrapy
import json
import sys

from ..items import AmazonItem


class MySpider(scrapy.Spider):
    name = 'price_monitor'
    newlist = []
    start_urls = []
    itemdatalist = []
    with open('C:\\Users\\hassy\\Documents\\python_venv\\price_monitor\\price_monitor\\products.json') as f:
        data = json.load(f)

        itemdatalist = data['itemdata']

        for item in itemdatalist:
            start_urls.append(item['url'])

    def start_requests(self):

        for item in MySpider.start_urls:

            yield scrapy.Request(url=item, callback=self.parse)

    def parse(self, response):
        for url in MySpider.start_urls:
            scrapeitem = AmazonItem()

            title = response.css('span#productTitle::text').extract_first()
            title = title.strip()
            price = response.css('span#priceblock_ourprice::text').extract_first()

            scrapeitem['title'] = title
            scrapeitem['price'] = price

        for item in MySpider.data['itemdata']:
            url = item['url']
            name = item['name']
            email = item['email']

            scrapeitem['url'] = url
            scrapeitem['name'] = name
            scrapeitem['email'] = email

        yield scrapeitem

0 个答案:

没有答案