我可以使用start_urls抓取网址列表吗?

时间:2020-09-01 11:29:00

标签: scrapy

我有一个要从中抓取数据的URL列表。它来自我要更新的数据库,但不确定如何进行。

import scrapy
import sqlite3
from datetime import datetime, timedelta

class A1hrlaterSpider(scrapy.Spider):
    name = 'onehrlater'
    allowed_domains = ['donedeal.ie']
    timenow = datetime.now()
    delta = timedelta(minutes=0)
    delta2 = timedelta(minutes=1)
    past_time = timenow - delta
    past_time2 = timenow - delta2
    conn = sqlite3.connect('ddother.db')
    c = conn.cursor()
    c.execute("SELECT adUrl FROM database WHERE timestamp BETWEEN ? AND ?", (past_time2, past_time))
    all_urls = c.fetchall()
    urllist = [item[0] for item in all_urls]

    print(urllist)

    conn.commit()

    conn.close()

网址列表是我要抓取的网址列表。但是我不确定如何使用start_urls来 请点击链接,或者如果确实是这样做的正确方法。我可以说start_urls = urllist还是这样?

任何帮助将不胜感激。谢谢

0 个答案:

没有答案