Question

我正尝试提取cpu的套接字类型，如以下image所示。我已经确定套接字类型在<h4>套接字标题下，如以下image所示。

到目前为止，我已经能够抓取.spec.block并找到嵌套在其中的所有<h4>'s。但是我无法在每个标题下找到文本

这是我的代码

from requests_html import HTMLSession
session = HTMLSession()

r = session.get('https://au.pcpartpicker.com/product/' + jLF48d)
about = r.html.find('.specs.block')[0]
about = about.find('h4')

print(about.text)

此打印

 [ <Element 'h4' >, <Element 'h4' >, <Element 'h4' >, <Element 'h4' >,
 <Element 'h4' >, <Element 'h4' >, <Element 'h4' >, <Element 'h4' >,
 <Element 'h4' >, <Element 'h4' >, <Element 'h4' >]

但是，当我将打印语句更改为：

print(about.text)

我收到以下错误：

AttributeError：“列表”对象没有属性“文本”

更新：

print(about[0].text)

此代码显示：

制造商 AMD公司这是第一个标题和文本，但是我需要第四个

您知道我可以使用什么代码来达到预期的结果吗？

如果您需要更多信息，请告诉我。

Answer 1

替换：打印（约[0]。文本）

使用

print(about[3].text)

如上面问题中的代码所示，为我解决了这个问题！

使用Requests-HTML（Requests-HTML，Python）在<h4>下抓取文本

1 个答案: