Question

全部，

我正在尝试使用Selenium抓取动态零售网页。我正在尝试使用特定的类名＆＃34; product-name来获取所有项目的列表。＆＃34;该网站的HTML如下所示：

从附带的例子中，我想要的是产品名称/标题：＆＃34; COACH X KEITH HARING CHARLIE CARRYALL在签名拼版中。＆＃34;我想要这个页面上的每个产品。要做到这一点，我可以搜索＆＃34;标题＆＃34;字段，或＆＃34;内容＆＃34;带有meta标记的行中的字段。话虽这么说，我是Selenium的新手并且不知道怎么拉这个。我所知道的是find_elements_by ...命令，但我认为它只会返回我指定/搜索的字段。我的代码应该返回此网页上的所有产品名称，因此我需要一些方法来指定如何识别标题/产品名称的位置，然后提供一种方法来提取这些字段。

使用BeautifulSoup我可以通过一个类名搜索并检索其他一些指定类的值，但我不知道如何以这种方式使用Selenium。我想我需要使用Selenium而不是BeautifulSoup，因为网站是动态的。是否有一些内置于Selenium的功能，如BeautifulSoup的.findAll（）命令，可用于使用另一个指定的字段名从一行中检索一个字段名称？

提前致谢！

Answer 1

You can get this using a nice, compact CSS selector. They are faster than XPath and I find them much easier to read.

products = driver.find_elements_by_css_selector("meta[itemprop='name']")
for product in products:
    print(product.get_attribute("content"))

We're basically looking for this META tag

<meta itemprop="name" content="COACH X KEITH HARING ACADEMY BACKPACK">

using the itemprop attribute and then pulling the content attribute.

Answer 2

这是非常简单和基本的xpath

elems = driver.find_elements_by_xpath("//div[@class='product-name']/meta[@itemprop='name']")
for elem in elems:
    print(elem.get_attribute("content"))

Selenium解析 - 如何通过一个类查找元素并返回其他类

2 个答案: