我有一个包含以下内容的字符串:
var string =
'<div class="product-info-inner-content clearfix ">\
<a href="http://www.adidas.co.uk/ace-17_-purecontrol-firm-ground-boots/BB4314.html"\
class="link-BB4314 product-link clearfix "\
data-context="name:ACE 17+ Purecontrol Firm Ground Boots"\
data-track="BB4314"\
data-productname="ACE 17+ Purecontrol Firm Ground Boots" tabindex="-1">\
<span class="title">ACE 17+ Purecontrol Firm Ground Boots</span>\
<span class="subtitle">Men Football</span>\
</a>\
</div>';
我正在尝试执行以下Python代码的JavaScript等价物,其中使用漂亮的汤来获取给定产品代码的div类元素的URL(即在这种情况下为BB4314)。
is_listing = len(soup.findAll(name="div", attrs={"class": "product-tile"})) > 1
if is_listing:
# stuck from this part
attrs = {"class": re.compile(r".*\bproduct-link\b.*"), "data-track": code}
url = soup.find(name="a", attrs=attrs)
url = url["href"]
我该怎么做?
答案 0 :(得分:2)
只需使用DOM
var string = '<div class="product-info-inner-content clearfix "><a href="http://www.adidas.co.uk/ace-17_-purecontrol-firm-ground-boots/BB4314.html" class="link-BB4314 product-link clearfix " data-context="name:ACE 17+ Purecontrol Firm Ground Boots" data-track="BB4314" data-productname="ACE 17+ Purecontrol Firm Ground Boots" tabindex="-1"><span class="title">ACE 17+ Purecontrol Firm Ground Boots</span> <span class="subtitle">Men Football</span></a></div>',
div = document.createElement("div");
div.innerHTML = string;
var href = div.querySelector("a.product-link").href,
parts = href.split("/"),
code = parts.pop().split(".")[0];
console.log(code)
console.log(div.querySelector("a.product-link").getAttribute("data-track"))