Question

我正在尝试抓取此网站：

https://www.lanebryant.com/perfect-sleeve-swing-tunic-top/prd-356831#color/0000009320

我想获取衣服的类型，即衣服的类别。页面上有一个脚本：

如何收集此文本并获得图像中突出显示的衣服的类别？我尝试了以下代码，但未返回任何内容。

type = d.find_element_by_xpath("//script[@type='text/javascript']").text
print("hiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiii"+type)

这里是驱动程序

Answer 1

你在这里...

1。获取脚本标签的render() { return ( <div>{UI}</div> ) }

2。转换为innerHTML格式

3。使用Json()，然后获取值parameter

tops

希望这会有所帮助。

Answer 2

尝试这样的事情，

type = d.find_element_by_xpath('//script[@type="text/javascript"]').text

还要在页面源中计算脚本标记。

Answer 3

当前方式的问题之一是您收集了当前页面上的所有脚本，需要将其范围缩小一点。

这会找到正确的脚本，然后在正则表达式的帮助下收集类别：

from lxml import html
import requests
import re
# create the regex
category_regex = re.compile(r'(?<="category": ").*(?=", "CategoryID")')
page = requests.get('https://www.lanebryant.com/perfect-sleeve-swing-tunic-top/prd-356831#color/0000009320')
tree = html.fromstring(page.content)
information = tree.xpath("//script[contains(text(), '\"page\": {    \"pageName\": \"Clothing :')]/text()")
print(category_regex.findall(str(information)))

Output: ['Tops']

如何在HTML的脚本标签中获取文本

3 个答案: