我正在尝试从这家 Walmart link 获取部门名称。可以看到,首先Departments
里面左边有7个部门(巧克力曲奇、曲奇、黄油曲奇……)。当我点击 See All Departments
时,又添加了 9 个类别,所以现在数字是 16。我正在尝试自动获取所有 16 个部门。我写了这段代码;
from selenium import webdriver
n_links = []
driver = webdriver.Chrome(executable_path='D:/Desktop/demo/chromedriver.exe')
url = "https://www.walmart.com/browse/snacks-cookies-chips/cookies/976759_976787_1001391"
driver.get(url)
search = driver.find_element_by_xpath("//*[@id='Departments']/div/div/ul").text
driver.find_element_by_xpath("//*[@id='Departments']/div/div/button/span").click()
search2 = driver.find_element_by_xpath("//*[@id='Departments']/div/div/div/div").text
sep = search.split('\n')
sep2 = search2.split('\n')
lngth = len(sep)
lngth2 = len(sep2)
for i in range (1,lngth):
path = "//*[@id='Departments']/div/div/ul/li"+"["+ str(i) + "]/a"
nav_links = driver.find_element_by_xpath(path).get_attribute('href')
n_links.append(nav_links)
for i in range (1,lngth2):
path = "//*[@id='Departments']/div/div/div/div/ul/li"+"["+ str(i) + "]/a"
nav_links2 = driver.find_element_by_xpath(path).get_attribute('href')
n_links.append(nav_links2)
print(n_links)
print(len(n_links))
当我运行代码时,最后我可以看到 n_links
数组中的链接。但问题是;有时它有 13 个链接,有时有 14 个。它应该是 16 个,我还没有看到 16 个,只有 13 个或 14 个。我尝试在 time.sleep(3)
行之前添加 search2
,但没有用。你能帮我吗?
答案 0 :(得分:0)
我认为你让这件事变得比现在更复杂。你是对的,如果你点击按钮,你可能需要等待才能获得部门。
# This code will get all the departments shown
departments = []
departments = driver.find_elements_by_xpath("//li[contains(@class,'department')]")
# Click on the show all departments button
driver.find_element_by_xpath("//button[@data-automation-id='button']//span[contains(text(),'all Departments')]").click()
# Will get the departments shown
departments = driver.find_elements_by_xpath("//li[contains(@class,'department')]")
# Iterate through the departments
for d in departments:
print(d)
答案 1 :(得分:0)
要打印所有产品 (16),您可以尝试使用 CSS 选择器搜索它们:.collapsible-content > ul a, .sometimes-shown a
。
在您的示例中:
from selenium import webdriver
driver = webdriver.Chrome()
url = (
"https://www.walmart.com/browse/snacks-cookies-chips/cookies/976759_976787_1001391"
)
driver.get(url)
search = driver.find_element_by_xpath("//*[@id='Departments']/div/div/ul").text
driver.find_element_by_xpath("//*[@id='Departments']/div/div/button/span").click()
all_departments = [
link.get_attribute("href")
for link in driver.find_elements_by_css_selector(
".collapsible-content > ul a, .sometimes-shown a"
)
]
print(len(all_departments))
print(all_departments)
输出:
16
['https://www.walmart.com/browse/food/chocolate-cookies/976759_976787_1001391_4007138', 'https://www.walmart.com/browse/food/cookies/976759_976787_1001391_8331066', 'https://www.walmart.com/browse/food/butter-cookies/976759_976787_1001391_7803640', 'https://www.walmart.com/browse/food/shortbread-cookies/976759_976787_1001391_8026949', 'https://www.walmart.com/browse/food/coconut-cookies/976759_976787_1001391_6970757', 'https://www.walmart.com/browse/food/healthy-cookies/976759_976787_1001391_7466302', 'https://www.walmart.com/browse/food/keebler-cookies/976759_976787_1001391_3596825', 'https://www.walmart.com/browse/food/biscotti/976759_976787_1001391_2224095', 'https://www.walmart.com/browse/food/gluten-free-cookies/976759_976787_1001391_4362193', 'https://www.walmart.com/browse/food/molasses-cookies/976759_976787_1001391_3338971', 'https://www.walmart.com/browse/food/peanut-butter-cookies/976759_976787_1001391_6460174', 'https://www.walmart.com/browse/food/pepperidge-farm-cookies/976759_976787_1001391_2410932', 'https://www.walmart.com/browse/food/snickerdoodle-cookies/976759_976787_1001391_8926167', 'https://www.walmart.com/browse/food/sugar-free-cookies/976759_976787_1001391_5314659', 'https://www.walmart.com/browse/food/tate-s-cookies/976759_976787_1001391_9480535', 'https://www.walmart.com/browse/food/vegan-cookies/976759_976787_1001391_8007359']
答案 2 :(得分:0)
仅使用 beautifulsoup
:
import json
import requests
from bs4 import BeautifulSoup
url = "https://www.walmart.com/browse/snacks-cookies-chips/cookies/976759_976787_1001391"
headers = {
"User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:88.0) Gecko/20100101 Firefox/88.0",
"Accept-Language": "en-US,en;q=0.5",
}
soup = BeautifulSoup(requests.get(url, headers=headers).content, "html.parser")
data = json.loads(soup.select_one("#searchContent").contents[0])
# uncomment to see all data:
# print(json.dumps(data, indent=4))
def find_departments(data):
if isinstance(data, dict):
if "name" in data and data["name"] == "Departments":
yield data
else:
for v in data.values():
yield from find_departments(v)
elif isinstance(data, list):
for v in data:
yield from find_departments(v)
departments = next(find_departments(data), {})
for d in departments.get("values", []):
print(
"{:<30} {}".format(
d["name"], "https://www.walmart.com" + d["baseSeoURL"]
)
)
打印:
Chocolate Cookies https://www.walmart.com/browse/food/chocolate-cookies/976759_976787_1001391_4007138
Cookies https://www.walmart.com/browse/food/cookies/976759_976787_1001391_8331066
Butter Cookies https://www.walmart.com/browse/food/butter-cookies/976759_976787_1001391_7803640
Shortbread Cookies https://www.walmart.com/browse/food/shortbread-cookies/976759_976787_1001391_8026949
Coconut Cookies https://www.walmart.com/browse/food/coconut-cookies/976759_976787_1001391_6970757
Healthy Cookies https://www.walmart.com/browse/food/healthy-cookies/976759_976787_1001391_7466302
Keebler Cookies https://www.walmart.com/browse/food/keebler-cookies/976759_976787_1001391_3596825
Biscotti https://www.walmart.com/browse/food/biscotti/976759_976787_1001391_2224095
Gluten-Free Cookies https://www.walmart.com/browse/food/gluten-free-cookies/976759_976787_1001391_4362193
Molasses Cookies https://www.walmart.com/browse/food/molasses-cookies/976759_976787_1001391_3338971
Peanut Butter Cookies https://www.walmart.com/browse/food/peanut-butter-cookies/976759_976787_1001391_6460174
Pepperidge Farm Cookies https://www.walmart.com/browse/food/pepperidge-farm-cookies/976759_976787_1001391_2410932
Snickerdoodle Cookies https://www.walmart.com/browse/food/snickerdoodle-cookies/976759_976787_1001391_8926167
Sugar-Free Cookies https://www.walmart.com/browse/food/sugar-free-cookies/976759_976787_1001391_5314659
Tate's Cookies https://www.walmart.com/browse/food/tate-s-cookies/976759_976787_1001391_9480535
Vegan Cookies https://www.walmart.com/browse/food/vegan-cookies/976759_976787_1001391_8007359
答案 3 :(得分:0)
为什么不使用 .visibility_of_all_elements_located
?
texts = []
links =[]
driver.get('https://www.walmart.com/browse/snacks-cookies-chips/cookies/976759_976787_1001391')
wait = WebDriverWait(driver, 60)
wait.until(EC.element_to_be_clickable((By.XPATH, "//span[text()='See all Departments']/parent::button"))).click()
elements = wait.until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "li.department-single-level a")))
for element in elements:
#to get text
texts.append(element.text)
#to get link by attribute name
links.append(element.get_attribute('href'))
print(texts)
print(links)
控制台输出:
[u'Chocolate Cookies', u'Cookies', u'Butter Cookies', u'Shortbread Cookies', u'Coconut Cookies', u'Healthy Cookies', u'Keebler Cookies', u'Biscotti', u'Gluten-Free Cookies', u'Molasses Cookies', u'Peanut Butter Cookies', u'Pepperidge Farm Cookies', u'Snickerdoodle Cookies', u'Sugar-Free Cookies', u"Tate's Cookies", u'Vegan Cookies']
[u'https://www.walmart.com/browse/food/chocolate-cookies/976759_976787_1001391_4007138', u'https://www.walmart.com/browse/food/cookies/976759_976787_1001391_8331066', u'https://www.walmart.com/browse/food/butter-cookies/976759_976787_1001391_7803640', u'https://www.walmart.com/browse/food/shortbread-cookies/976759_976787_1001391_8026949', u'https://www.walmart.com/browse/food/coconut-cookies/976759_976787_1001391_6970757', u'https://www.walmart.com/browse/food/healthy-cookies/976759_976787_1001391_7466302', u'https://www.walmart.com/browse/food/keebler-cookies/976759_976787_1001391_3596825', u'https://www.walmart.com/browse/food/biscotti/976759_976787_1001391_2224095', u'https://www.walmart.com/browse/food/gluten-free-cookies/976759_976787_1001391_4362193', u'https://www.walmart.com/browse/food/molasses-cookies/976759_976787_1001391_3338971', u'https://www.walmart.com/browse/food/peanut-butter-cookies/976759_976787_1001391_6460174', u'https://www.walmart.com/browse/food/pepperidge-farm-cookies/976759_976787_1001391_2410932', u'https://www.walmart.com/browse/food/snickerdoodle-cookies/976759_976787_1001391_8926167', u'https://www.walmart.com/browse/food/sugar-free-cookies/976759_976787_1001391_5314659', u'https://www.walmart.com/browse/food/tate-s-cookies/976759_976787_1001391_9480535', u'https://www.walmart.com/browse/food/vegan-cookies/976759_976787_1001391_8007359']
需要以下导入:
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC