使用beautifulsoup从“值”属性中提取文本

时间:2019-04-18 16:28:55

标签: python html web-scraping beautifulsoup attributes

HTML代码:

<td id="ContentPlaceHolder1_ContentPlaceHolder1_Product_eBay1_tdBINPrice">
    <input id="ContentPlaceHolder1_ContentPlaceHolder1_Product_eBay1_txtBuyItNowPrice" name="ctl00$ctl00$ContentPlaceHolder1$ContentPlaceHolder1$Product_eBay1$txtBuyItNowPrice" 
    style="width:50px;" type="text" value="1435.97"/>                           
    <img id="ContentPlaceHolder1_ContentPlaceHolder1_Product_eBay1_imgCustomPriceCalculator" src="/images/ChannelPriceCustom.png" style="width:16px;"/>
</td>

我想在'Value'属性('1435.95')中添加文字

我尝试通过执行以下代码来做到这一点,但是没有运气。

driver.get(someURL)
       page = driver.page_source
       soup = BeautifulSoup(page, 'lxml')
       price = soup.find('td', {'id' : re.compile('ContentPlaceHolder1_ContentPlaceHolder1_Product_eBay1_tdBINPrice')})

       print(price)

谢谢!

2 个答案:

答案 0 :(得分:2)

尝试以下代码。

from bs4 import BeautifulSoup

html='''<td id="ContentPlaceHolder1_ContentPlaceHolder1_Product_eBay1_tdBINPrice">
    <input id="ContentPlaceHolder1_ContentPlaceHolder1_Product_eBay1_txtBuyItNowPrice" name="ctl00$ctl00$ContentPlaceHolder1$ContentPlaceHolder1$Product_eBay1$txtBuyItNowPrice" 
    style="width:50px;" type="text" value="1435.97"/>                           
    <img id="ContentPlaceHolder1_ContentPlaceHolder1_Product_eBay1_imgCustomPriceCalculator" src="/images/ChannelPriceCustom.png" style="width:16px;"/>
</td>'''

soup = BeautifulSoup(html, 'html.parser')

textval=soup.select_one("input[name='ctl00$ctl00$ContentPlaceHolder1$ContentPlaceHolder1$Product_eBay1$txtBuyItNowPrice']")
print(textval['value'])

OR

from bs4 import BeautifulSoup

html='''<td id="ContentPlaceHolder1_ContentPlaceHolder1_Product_eBay1_tdBINPrice">
    <input id="ContentPlaceHolder1_ContentPlaceHolder1_Product_eBay1_txtBuyItNowPrice" name="ctl00$ctl00$ContentPlaceHolder1$ContentPlaceHolder1$Product_eBay1$txtBuyItNowPrice" 
    style="width:50px;" type="text" value="1435.97"/>                           
    <img id="ContentPlaceHolder1_ContentPlaceHolder1_Product_eBay1_imgCustomPriceCalculator" src="/images/ChannelPriceCustom.png" style="width:16px;"/>
</td>'''

soup = BeautifulSoup(html, 'html.parser')

textval=soup.find("input" ,attrs={"name" : "ctl00$ctl00$ContentPlaceHolder1$ContentPlaceHolder1$Product_eBay1$txtBuyItNowPrice"})
print(textval['value'])

答案 1 :(得分:2)

有一个ID,它是最快的选择器。使用它来获取元素,然后使用value属性。您的部分失败是因为您正在查看td未输入标签

from bs4 import BeautifulSoup as bs
html = '''
<td id="ContentPlaceHolder1_ContentPlaceHolder1_Product_eBay1_tdBINPrice">
    <input id="ContentPlaceHolder1_ContentPlaceHolder1_Product_eBay1_txtBuyItNowPrice" name="ctl00$ctl00$ContentPlaceHolder1$ContentPlaceHolder1$Product_eBay1$txtBuyItNowPrice" 
    style="width:50px;" type="text" value="1435.97"/>                           
    <img id="ContentPlaceHolder1_ContentPlaceHolder1_Product_eBay1_imgCustomPriceCalculator" src="/images/ChannelPriceCustom.png" style="width:16px;"/>
</td>
'''
soup = bs(html, 'lxml')
soup.select_one('#ContentPlaceHolder1_ContentPlaceHolder1_Product_eBay1_txtBuyItNowPrice')['value']

您的用户(有输入):

print(soup.find("input", {"id": "ContentPlaceHolder1_ContentPlaceHolder1_Product_eBay1_txtBuyItNowPrice"})['value'])