提取HTML数据中的值

时间:2016-01-25 19:48:34

标签: python html regex

我在python中使用这种HTML格式的数据:

<input type="hidden" name="__EVENTTARGET" id="__EVENTTARGET" value="" >
<input type="hidden" name="__EVENTARGUMENT" id="__EVENTARGUMENT" value="" />
<input type="hidden" name="__VIEWSTATE" id="__VIEWSTATE" value="ky6272M5yMyLqwLSiOD7282n7W/4c5S+PsBnbknDUX8d4iGsUDPboCpQG3F86cgBN3u3/nrEYLDN43eRdevxKrBv6MBnwC8l0l3WLxFOKGpqGUl5KzodoLbQB44LtcSYLudbO+lczSjwyEzsHOrw3IW4VT1HAT/OjPJI36AIf/BAXY/UoKT38X1yrDNE0sf0jk5WOPq+v+wh+Dsw9F6dojZXucY5dmGdNWaigKKn6VSG6tkzqsCFVjYEkzTjj1ItCdstnDZv2LVHRJpQ654Zvcf2IkQOR7p+V+TLRYdR9yOngXh2p/qt6UXYrR4DVUPkgxiCuIjFpSpYvGmHuw3+ocadeLklAtAQZbQF63c+xyogyV4Dm2fW2BT1+fhW+lqoo5aTFcWM+2v2SwfSsRKOMUH9MudewVDP0ro/3w9+OPq1q8hHGDzzbwDJh7nOvyW67DYY1AEp2NV1lCbDwazCX0DHpW/prlmuFMj1zt+mamjoGERWNujqr6FQNgSG1n62VrJMdBhEwYdHNYuWEQorD/EA3ze/5Pmxv7j6PngmoNv9uVtOwq4M3RhtgjS4OY5RsBO8l+Ij74Mqihh5xa0T3D2p5VIBZJW5M3nb6c1yuNqgcNgstqNU2BDwE/T1h+sF8wK7BG0YKQd6BrilABj1+AZZElrS9SdDtjuyKFGWEx2qLHUpWrkys4yy3Icq7xSsf/eDsg==" />

我想用python中的正则表达式提取value属性的内容。

2 个答案:

答案 0 :(得分:2)

html会复杂得多。

from bs4 import BeautifulSoup


html = '<input type="hidden" name="__EVENTTARGET" id="__EVENTTARGET" value="" >'
soup = BeautifulSoup(html, 'lxml')
input_tag = soup.find('input')
input_tag['value']

答案 1 :(得分:1)

使用BeautifulSoup,您可以使用BeautifulSoup类的find方法并提取value属性,如下所示:

from bs4 import BeautifulSoup
x = """<input type="hidden" name="__EVENTTARGET" id="__EVENTTARGET" value="" >"""
soup = BeautifulSoup(x)
print soup.find('input')['value']