虽然我在Python中获得了超过10个项目的结果,但现在我只能将最后一个产品出现在我的MySQL数据库中(id为12以及其价格,图片等信息)。我需要修复它以便它们都出现而不仅仅是一种产品。
Python代码如下。
import requests
from bs4 import BeautifulSoup
import mysql.connector
url = 'https://www.newegg.com/Video-Cards-Video-Devices/Category/ID-38?Tpk=graphics%20card'
source = requests.get(url).text
soup = BeautifulSoup(source, 'lxml')
conn = mysql.connector.connect(host='127.0.0.1', user='x', database='scrape',password="x")
cursor = conn.cursor()
item_container = soup.find_all('div', class_='item-container')
def get_data():
lists = []
for index, item_name in enumerate(item_container):
name = item_name.find_all('a', class_='item-title')[0].text
lists.append({'name': name})
lists[index]['index'] = index
for index, item_price in enumerate(item_container):
price = item_price.find('li', class_='price-current').find('strong')
if price == None:
price == ('Not Available')
lists[index]['price'] = price
else:
price = ('$' + price.text +'.99')
prices = []
lists[index]['price'] = price
for index, item_picture in enumerate(item_container):
picture = 'http:' + item_picture.find('img', class_='lazy-img')['data-src']
lists[index]['picture'] = picture
for index, item_shipping in enumerate(item_container):
shipping = (item_shipping.find('li', class_='price-ship').text).strip()
lists[index]['shipping'] = shipping
def create_table():
val_index = lists[index]['index']
val_name = lists[index]['name']
val_picture = lists[index]['picture']
val_price = lists[index]['price']
val_shipping = lists[index]['shipping']
add_item = ("INSERT INTO newegg "
"(id, itemname, itempic, itemprice, itemshipping) "
"VALUES (%s, %s, %s, %s, %s)")
data_item = (val_index, val_name, val_picture, val_price, val_shipping)
cursor.execute("DELETE FROM newegg ")
conn.commit()
cursor.execute(add_item, data_item)
conn.commit()
cursor.close()
conn.close()
create_table();
get_data()
答案 0 :(得分:1)
因此需要修复的主要内容是var modData = {};
data = data.Results;
for (var key in data) {
if (data.hasOwnProperty(key)) {
var temp = Object();
temp[key] = data[key];
modData[key] = temp;
for (var innerKey in data[key]) {
var temp = Object();
temp[key] = data[key];
modData[key] = temp;
}
}
}
。我们不希望它在插入项目之前删除数据库内容。此外,我们需要循环遍历create_table()
中的所有项目。我会这样做。
lists
注意,def create_table():
cursor.execute("DELETE FROM newegg ")
conn.commit()
for product in lists:
val_index = product['index']
val_name = product['name']
val_picture = product['picture']
val_price = product['price']
val_shipping = product['shipping']
add_item = ("INSERT INTO newegg "
"(id, itemname, itempic, itemprice, itemshipping) "
"VALUES (%s, %s, %s, %s, %s)")
data_item = (val_index, val_name, val_picture, val_price, val_shipping)
cursor.execute(add_item, data_item)
conn.commit()
也不再为您关闭连接。我建议在初始化它的同一范围内关闭连接(在本例中为全局范围)。函数create_table()
不“拥有”连接资源,因此不应该允许它将其销毁。虽然初始化和破坏函数内部的连接是完全合理的。
另外,请注意,每次进行抓取时,这将清除您的表格。这可能没问题,但是如果您想要随时更改create_table()
,请不要在开头删除,并将id
列设置为自动增量或其他内容。