Question

我已经使用Scrapy成功地在网站上抓取了图像数据，并将图像保存在文件夹中。但我想将图像的路径名保存在mysql数据库中。

就像下面蜘蛛的结果一样，我想将数据“路径”转发到管道，但是我不知道如何选择它

    var i = Intent(ACTION_MANAGE_DEFAULT_APPS_SETTINGS)
        startActivity(i)

我的pipelines.py：

'images': [{'checksum': '75873dcc0944e29787525197648aa1a6',
             'path': 'full/91e6d13e3ad32def287f98199c8bbe1915c71773.jpg',
             'url': 'https://cdn.sindonews.net/dyn/620/content/2019/08/05/12/1426977/masa-kampanye-jadwal-pemilu-hingga-e-voting-jadi-isu-revisi-uu-pemilu-qoQ.jpg'}],

我的项目。py

import mysql.connector

class SkripsiPipeline(object):

    def __init__(self):
        self.create_connection()
        # dispatcher.connect(self.close_spider, signals.close_spider)
        # self.create_table()

    def create_connection(self):
        self.conn = mysql.connector.connect(
            host = '127.0.0.1',
            password = '',
            user = 'root',
            database = 'news'
        )
        self.curr = self.conn.cursor()

    def process_item(self, item, spider):
        self.store_db(item)
        return item

    def store_db(self,item):
        self.curr.execute("INSERT INTO news_tb (url, title, author, time, crawl_time, image_urls, images, content) values (%s,%s,%s,%s,%s,%s,%s,%s)",(
            item['url'][0],
            item['title'][0],
            item['author'][0],
            item['time'][0],
            item['crawl_time'][0],
            item['image_urls'][0],
            item['content'][0]
        ))
        self.conn.commit()

我想将图像路径的名称保存到数据库。熟悉此问题的任何人，请告诉我。谢谢。

Answer 1

如果图像列表是项目的一部分，则可以像这样def store_db(self,item): path = item['images'][0]['path'].split('/')[1] self.curr.execute("INSERT INTO news_tb (url, title, author, time, crawl_time, image_urls, images, content, path) values (%s,%s,%s,%s,%s,%s,%s,%s,%s)",( item['url'][0], item['title'][0], item['author'][0], item['time'][0], item['crawl_time'][0], item['image_urls'][0], item['content'][0], path )) self.conn.commit()进行选择。

您可以通过如下更改store_db方法将其添加到管道中：

grandType

将图像路径名保存在数据库中

1 个答案: