如何将Python / Beautiful Soup中的Web抓取数据导入MySQL数据库

时间:2018-01-10 14:56:04

标签: python mysql beautifulsoup

虽然我在Python中获得了超过10个项目的结果,但现在我只能将最后一个产品出现在我的MySQL数据库中(id为12以及其价格,图片等信息)。我需要修复它以便它们都出现而不仅仅是一种产品。

Python代码如下。

import requests
from bs4 import BeautifulSoup
import mysql.connector

url = 'https://www.newegg.com/Video-Cards-Video-Devices/Category/ID-38?Tpk=graphics%20card'

source = requests.get(url).text

soup = BeautifulSoup(source, 'lxml')



conn = mysql.connector.connect(host='127.0.0.1', user='x', database='scrape',password="x")                         
cursor = conn.cursor()  



item_container = soup.find_all('div', class_='item-container')


def get_data():
lists = []
    for index, item_name in enumerate(item_container):
        name = item_name.find_all('a', class_='item-title')[0].text
        lists.append({'name': name})
        lists[index]['index'] = index


    for index, item_price in enumerate(item_container):
        price = item_price.find('li', class_='price-current').find('strong')
        if price == None:

            price == ('Not Available')
            lists[index]['price'] = price


        else:

            price = ('$' + price.text +'.99')
            prices = []
        lists[index]['price'] = price

    for index, item_picture in enumerate(item_container):
            picture = 'http:' + item_picture.find('img', class_='lazy-img')['data-src']

            lists[index]['picture'] = picture

    for index, item_shipping in enumerate(item_container):
            shipping = (item_shipping.find('li', class_='price-ship').text).strip()
            lists[index]['shipping'] = shipping


def create_table():

    val_index = lists[index]['index']
    val_name = lists[index]['name']
    val_picture = lists[index]['picture']
    val_price = lists[index]['price']
    val_shipping = lists[index]['shipping']


    add_item = ("INSERT INTO newegg "
                "(id, itemname, itempic, itemprice, itemshipping) "
                "VALUES (%s, %s, %s, %s, %s)")

    data_item = (val_index, val_name, val_picture, val_price, val_shipping)


    cursor.execute("DELETE FROM newegg ")
    conn.commit()
    cursor.execute(add_item, data_item)  
    conn.commit()

    cursor.close() 
    conn.close()                                                                  

create_table();
get_data()

1 个答案:

答案 0 :(得分:1)

因此需要修复的主要内容是var modData = {}; data = data.Results; for (var key in data) { if (data.hasOwnProperty(key)) { var temp = Object(); temp[key] = data[key]; modData[key] = temp; for (var innerKey in data[key]) { var temp = Object(); temp[key] = data[key]; modData[key] = temp; } } } 。我们不希望它在插入项目之前删除数据库内容。此外,我们需要循环遍历create_table()中的所有项目。我会这样做。

lists

注意,def create_table(): cursor.execute("DELETE FROM newegg ") conn.commit() for product in lists: val_index = product['index'] val_name = product['name'] val_picture = product['picture'] val_price = product['price'] val_shipping = product['shipping'] add_item = ("INSERT INTO newegg " "(id, itemname, itempic, itemprice, itemshipping) " "VALUES (%s, %s, %s, %s, %s)") data_item = (val_index, val_name, val_picture, val_price, val_shipping) cursor.execute(add_item, data_item) conn.commit() 也不再为您关闭连接。我建议在初始化它的同一范围内关闭连接(在本例中为全局范围)。函数create_table()不“拥有”连接资源,因此不应该允许它将其销毁。虽然初始化和破坏函数内部的连接是完全合理的。

另外,请注意,每次进行抓取时,这将清除您的表格。这可能没问题,但是如果您想要随时更改create_table(),请不要在开头删除,并将id列设置为自动增量或其他内容。