抓取电子商务网站上的嵌套元素

时间:2021-02-10 21:14:54

标签: python web-scraping data-science

当我访问特定的产品页面时,我试图用 Selenium 从 Target 的网站上抓取产品 img 网址,但没有任何返回。

这是我的那部分代码:

function isInt(value) {
    try {
        BigInt(value)
        return true
    } catch(e) {
        return false
    }
}

console.log('--- should be false')
console.log(isInt(undefined))
console.log(isInt(null))
console.log(isInt({}))
console.log(isInt(1.1e-1))
console.log(isInt(1.1))
console.log(isInt(NaN))
console.log(isInt(function(){}))
console.log(isInt(Infinity))

console.log('--- should be true')
console.log(isInt(10))
console.log(isInt(0x11))
console.log(isInt(0))
console.log(isInt(-10000))
console.log(isInt(100000000000000000000000000000000000000))
console.log(isInt(1n))
// gets converted to number
console.log(isInt(''))
console.log(isInt([]))
console.log(isInt(true))
console.log(isInt('1'))

HTML 截图: enter image description here

Link to example product

1 个答案:

答案 0 :(得分:0)

url 列表包含您要查找的 url:

url = "https://www.target.com/p/revolution-beauty-conceal-define-concealer-0-11-fl-oz/-/A-82003638?preselect=81551727#lnk=sametab"
headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:86.0) Gecko/20100101 Firefox/86.0"}
resp = rq.get(url, headers=headers)
soup = bs(resp.content)

divs_img = soup.find_all("div", attrs={"data-test": "product-image"})[0]
urls = [i["src"] for i in divs_img.find_all("img") if i["src"].startswith("https")]