Question

晚安，我在管理Scrapy请求时遇到了一些困难

我的蜘蛛的工作方式如下：

我使用了解析功能>
我连接到数据库>
我为数据库列表中的每个项目启动一个循环>
每个页面都有一个被旁路跳过的验证码，我使用post并为此获取>
我访问表单并处理信息。

发生的事情是，对于每个项目，都需要一个新的查询，只是我的代码重复了相同的验证码和相同的页面，我如何在每个新查询中“刷新”页面？

def parse(self, response):
    conn = None
    try:
        # read connection parameters
        # connect to the PostgreSQL server
        print('Connecting to the PostgreSQL database...')
        # create a cursor
        # execute a statement
        # display the PostgreSQL database server version
        print("--- Start crawl ---")
        print("Get ...")

        # Current Database ID
        cur.execute()
        print(cur.rowcount)
        row = cur.fetchone()
        count = 1
        while row is not None:
            
            print("Count: (%s/%s)" % (count, cur.rowcount))
         
            captcha

            if captcha: # Captcha code                   
                yield FormRequest.from_response(
                    response,
                    formdata={}                            
                    callback=self.start_scraping
                )
            else:
                print("Captcha not Found")

            row = cur.fetchone()
            count = count + 1

        # close the communication with the PostgreSQL
        cur.close()
    except (Exception, psycopg2.DatabaseError) as error:
        print(error)
    finally:
        if conn is not None:
            conn.close()
            print('Database connection closed.')

抓取新的请求网址

0 个答案: