我正在尝试通过网站上的多个容器抓取内容,以查看是否存在某个项目。我想比较一个特定的值,如果找到一个具有该值的项目,它将在该项目中写入该项目的价格以及在CSV文件中指向何处购买的链接。
我设法制作了一个for循环,循环遍历我要匹配的值,但是我无法弄清楚如何使用它来提取其他需要的元素。最终返回页面上第一个容器的值,而不是匹配的值。
我试图将它们放在for循环的内部以及外部。我意识到它不起作用,因为他们只找到一个元素,并且没有被告知要从哪个容器中拉出它,但是我在其他脚本中做了类似的操作,因此效果很好。
我也尝试了在彼此之间嵌套循环,但是出于明显的原因,它们也没有解决。处理这种情况的最佳方法是什么?
values = WebDriverWait(driver, 2).until(EC.presence_of_all_elements_located((By.XPATH, "//*[contains(@class,'text-center') and contains(text(),'Wear:')]")))
price = driver.find_element_by_class_name("item-price-display").text
buy_link = driver.find_element_by_css_selector("a.btn-xs").get_attribute('href')
print(len(values))
for value in values:
wear = value.text.replace("Wear: ", "")
print(wear)
if wear == condition:
print(buy_link,price)
f.write(buy_link + "," + price)
break
完整代码:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
profile = webdriver.FirefoxProfile()
profile.set_preference("permissions.default.image", 2) # Block all images to load websites faster.
driver = webdriver.Firefox(firefox_profile=profile)
f = open("file.csv",'r+')
url = "http://bitskins.com"
driver.get(url)
elem = driver.find_element_by_name("market_hash_name")
key = "Dragon Lore"
condition = "0.11940288"
elem.send_keys(key,Keys.RETURN)
import time
time.sleep(3)
values = WebDriverWait(driver, 2).until(EC.presence_of_all_elements_located((By.XPATH, "//*[contains(@class,'text-center') and contains(text(),'Wear:')]")))
print(len(values))
for value in values:
price = driver.find_element_by_class_name("item-price-display").text
buy_link = driver.find_element_by_css_selector("a.btn-xs").get_attribute('href')
wear = value.text.replace("Wear: ", "")
print(wear)
if wear == condition:
print(buy_link,price)
f.write(buy_link + "," + price)
break
预期结果:(此外,我试图找出如何选择第四个按钮,而不是添加到购物车旁边的第一个按钮。)
https://bitskins.com/view_item?app_id=730&item_id=14983017710 $ 1,355.23
我得到的结果:
https://steamcommunity.com/profiles/76561198380422063/inventory/#730_2_15685089707 $ 1,350.00
答案 0 :(得分:1)
问题是price
,buy_link
是页面中的第一个元素,与您使用values
获得的 Wear 不相关。请参阅下面的代码中的注释。
要获取第4个按钮,可以使用.item-solo a:nth-child(4)
CSS选择器。要在项目循环内使用以下代码:
shareable_link = item.find_element_by_css_selector("a:nth-child(4)")
完整代码:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import re
url = "http://bitskins.com"
key = "Dragon Lore"
condition = "0.11940288"
profile = webdriver.FirefoxProfile()
profile.set_preference("permissions.default.image", 2) # Block all images to load websites faster.
driver = webdriver.Firefox(firefox_profile=profile)
wait = WebDriverWait(driver, 10)
f = open("file.csv", 'r+')
driver.get(url)
wait.until(EC.element_to_be_clickable((By.NAME, "market_hash_name"))).send_keys(key, Keys.RETURN)
# get all sale item container elements
items = wait.until(EC.visibility_of_all_elements_located((By.CLASS_NAME, "item-solo")))
print(len(items))
for item in items:
# price, buy_link and wear elements are child of sale items
price = item.find_element_by_class_name("item-price-display").text
buy_link = item.find_element_by_css_selector("a.btn-xs").get_attribute('href')
shareable_link = item.find_element_by_css_selector("a:nth-child(4)").get_attribute('href')
wear = item.find_element_by_xpath("descendant::div[contains(@class,'text-center') and contains(text(),'Wear:')]").text
wear = re.search("\\d+.\\d+", wear)[0]
print(wear)
if wear == condition:
print(buy_link, price)
f.write(f"{buy_link},{price}")
break
对于web-scraping,requests和beautifulsoap或another而言,抓取库是更轻松,更快,资源更少的解决方案。