我正试图将抓取的数据显示在MySQL数据库中。我正在学习一个课程,但没有用。
我确保数据(标题,等级,upc,product_type)与csv文件中的数据顺序相同。
这是我的代码,
在代码编辑器中:
# -*- coding: utf-8 -*-
import os
import csv
import glob
import MySQLdb
from scrapy import Spider
from scrapy.http import Request
def product_info(response, value):
return response.xpath('//th[text()="' + value + '"]/following-sibling::td/text()').extract()[0]
class BooksSpider(Spider):
name = 'books'
allowed_domains = ['books.toscrape.com']
start_urls = ['http://books.toscrape.com']
def parse(self, response):
books = response.xpath('//h3/a/@href').extract()
for book in books:
absolute_url = response.urljoin(book)
yield Request(absolute_url, callback=self.parse_book)
def parse_book(self, response):
title = response.css('h1::text').extract_first()
rating = response.xpath('//*[contains(@class, "star-rating")]/@class').extract()[0]
rating = rating.replace('star-rating ', '')
# product information data points
upc = product_info(response, 'UPC')
product_type = product_info(response, 'Product Type')
yield{
'title': title,
'rating': rating,
'upc': upc,
'product_type': product_type,
}
def close(self, reason):
csv_file = max(glob.iglob('*.csv'), key=os.path.getctime)
mydb = MySQLdb.connect(host='localhost',
user='shay',
passwd='foo',
db='books_db')
cursor = mydb.cursor()
csv_data = csv.reader(file(csv_file))
row_count = 0
for row in csv_data:
if row_count != 0:
cursor.execute('INSERT IGNORE INTO books_table(title, rating, upc, product_type) VALUES(%s, %s, %s, $s)', row)
row_count += 1
mydb.commit()
cursor.close()
在终端机中:
mysql -u shay -p
Enter password:
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 8
Server version: 5.7.26-0ubuntu0.18.04.1 (Ubuntu)
Copyright (c) 2000, 2019, Oracle and/or its affiliates. All rights reserved.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
mysql> CREATE DATABASE books_db;
Query OK, 1 row affected (0.00 sec)
mysql> USE books_db;
Database changed
mysql> CREATE TABLE books_table(
-> title VARCHAR(20),
-> rating VARCHAR(20),
-> upc VARCHAR(20),
-> product_type VARCHAR(20));
Query OK, 0 rows affected (0.30 sec)
mysql> SELECT * FROM books_table;
Empty set (0.00 sec)
mysql> SELECT * FROM books_table;
Empty set (0.00 sec)
预期结果是终端中已抓取数据的表。
我运行代码,然后运行第二行(SELECT * FROM books_table;),该表应该显示,但它仍然是一个空集。
非常感谢您的帮助,谢谢!