我不熟悉网页抓取功能,希望从Target网站上获取产品数据。
图像的突出显示部分
我已经能够获得产品名称和价格,但是使用BeautifulSoup无法找到其余信息。例如,在检查邮政编码时,它会显示带有数据测试标签的邮政编码,但是在搜索标签时找不到。有没有人曾经经历过或知道获取此信息的方法?
使用Python 3和BeautifulSoup。
不确定表达此问题的最佳方法,所以让我知道您是否需要更多信息或是否需要改写。
<a href="#" class="h-text-underline Link-sc-1khjl8b-0 jvxzGg" data-test="storeFinderZipToggle">35401</a>
import requests
from bs4 import BeautifulSoup
f = open("demofile.txt", "w")
Page_Source = "https://www.target.com/p/nintendo-switch-with-neon-blue-and-neon-red-joy-con/-/A-52189185"
page = requests.get(Page_Source)
soup = BeautifulSoup(page.content, 'html.parser')
#write all the html code to a file to compare source files
f.write(str(soup))
#should contain city location but Secondary header can't be found
#location = soup.find("div", {'class', 'HeaderSecondary'})
#inside the secondary header should contain the store name but is not found
#store_location = location.find('div', {'data-test': 'storeId-store-name'})
#store_location = location.find('button', {'id': 'storeId-utility-NavBtn'})
#contains the rest of the information interested in
main_container = soup.find(id="mainContainer")
#complete_product_name = soup('span',attrs={'data-test':'product-title'})[0].text
product_price = soup.find("span", {'data-test': 'product-price'})
product_title = soup.find("span", {'data-test': 'product-title'})
flexible_fulfillment = main_container.find('div', {'data-test': 'flexible_fulfillment'})
#test = product_zip.find_all('a')
#example = soup.find_all("div", {'data-test': 'storePickUpType'})
example = soup.findAll('div', attrs={'data-test':'maxOrderQuantityTxt'})
print(product_title)
print(product_price)
print(flexible_fulfillment)
f.close()
答案 0 :(得分:0)
更新:使用Selenium的有用技巧。
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.common.exceptions import NoSuchElementException
#launch url
url = "https://www.target.com/p/nintendo-switch-with-neon-blue-and-neon-red-joy-con/-/A-52189185"
# create a new Firefox session
driver = webdriver.Safari()
driver.implicitly_wait(15)
driver.get(url)
try:
store_name_element = driver.find_element(By.XPATH, '//*[@id="storeId-utilityNavBtn"]/div[2]')
print(store_name_element.get_attribute('innerText'))
except Exception:
print "There's no store name available"
try:
item_name_element = driver.find_element(By.XPATH, '//*[@id="mainContainer"]/div/div/div[1]/div[1]/div[1]/h1/span')
print(item_name_element.get_attribute('innerText'))
except Exception:
print "There's no item name available"
try:
price_element = driver.find_element(By.XPATH, '//*[@id="mainContainer"]/div/div/div[1]/div[2]/div/div[1]/span')
print(price_element.get_attribute('innerText'))
except Exception:
print "There's no pricce available"
try:
zip_code_element = driver.find_element(By.XPATH, '//*[@id="mainContainer"]/div/div/div[1]/div[2]/div/div[6]/div/div[1]/div[1]/div/div[1]/a')
print(zip_code_element.get_attribute('innerText'))
except Exception:
print "There's no zip code available"
try:
order_by_element = driver.find_element(By.XPATH, '//*[@id="mainContainer"]/div/div/div[1]/div[2]/div/div[6]/div/div[1]/div[2]/p')
print(order_by_element.get_attribute('innerText'))
except Exception:
print "There's no order by time available"
try:
arrival_date_element = driver.find_element(By.XPATH, '//*[@id="mainContainer"]/div/div/div[1]/div[2]/div/div[6]/div/div[1]/div[2]/div/div/span')
print(arrival_date_element.get_attribute('innerText'))
except Exception:
print "There's no arrival date available"
try:
shipping_cost_element = driver.find_element(By.XPATH, '//*[@id="mainContainer"]/div/div/div[1]/div[2]/div/div[6]/div/div[2]/div/div[1]/div[1]/div[1]/div[1]')
print(shipping_cost_element.get_attribute('innerText'))
except Exception:
print "There's no shipping cost available"
try:
current_inventory_element = driver.find_element(By.XPATH, '//*[@id="mainContainer"]/div/div/div[1]/div[2]/div/div[6]/div/div[2]/div/div[1]/div[1]/div[1]/div[2]')
print(current_inventory_element.get_attribute('innerText'))
except Exception:
print "There's no current inventory available"
driver.quit()
尽管我注意到此代码的一件事是它与它的结果不一致。有时我会收到错误消息,指出未找到该元素,而其他时候它将找到该元素。有人知道为什么会这样吗?是因为我经常请求该网站吗?