我想抓取此网页https://www.off---white.com/en/GB/men/products/omia139f198000403020# /视图源:https://www.off---white.com/en/GB/men/products/omia139f198000403020#
对于变体,例如
<div class='product-variants'>
<form class="product-cart-form js-cart-form" action="/en/GB/orders/populate.json" accept-charset="UTF-8" method="post"><input name="utf8" type="hidden" value="✓" /><input type="hidden" name="authenticity_token" value="3VeMLZA3thbrl8EtNfA6rdNcAMXa/29u87AW7KbhyNQ=" /><div class='please-select-text'>
<p>Please select a size</p>
</div>
<div class='availability preorder-product'>
<p>
Pre-order will arrive by October 15
<sup>
th
</sup>
</p>
</div>
<ul class='styled-radio'>
<li>
<input type="radio" name="variant_id" id="variant_id_113207" value="113207" />
<label for="variant_id_113207">40</label>
</li>
<li>
<input type="radio" name="variant_id" id="variant_id_113208" value="113208" />
<label for="variant_id_113208">41</label>
</li>
<li>
<input type="radio" name="variant_id" id="variant_id_113209" value="113209" />
<label for="variant_id_113209">42</label>
</li>
<li>
<input type="radio" name="variant_id" id="variant_id_113210" value="113210" />
<label for="variant_id_113210">43</label>
</li>
<li>
<input type="radio" name="variant_id" id="variant_id_113211" value="113211" />
<label for="variant_id_113211">44</label>
</li>
<li>
<input type="radio" name="variant_id" id="variant_id_113212" value="113212" />
<label for="variant_id_113212">45</label>
</li>
</ul>
我当前的代码是:
s = requests.session()
def loadproduct():
product = 'https://www.off---white.com/en/GB/men/products/omia139f198000403020#'
getproduct = s.get(product)
bsproduct = bs(getproduct.text, 'html.parser')
#print(bsproduct)
allsizes = bsproduct.find('ul',{'class':'styled-radio'}).findAll('input)
print(allsizes)
loadproduct()
x= input('d')
答案 0 :(得分:-1)
该网页由javascript生成。
您必须使用selenium
之类的包将其报废。
检查此代码段:
代码:
from selenium import webdriver
from bs4 import BeautifulSoup
import time
driver = webdriver.Firefox()
driver.get('https://www.off---white.com/en/GB/men/products/omia139f198000403020#')
time.sleep(5)
html = driver.page_source
soup = BeautifulSoup(html, 'html.parser')
allsizes = soup.find('ul',{'class':'styled-radio'}).findAll('input')
for size in allsizes:
print(size.get('value'))
输出:
113207
113208
113209
113210
113211
113212