Question

我正在为一个网站编写一个Python解析器来自动完成某项工作，但我并没有太多地进入＆＃34; re＆＃34; Py的模块（正则表达式），并且不能使它工作。

req = urllib2.Request(tl2)
req.add_unredirected_header('User-Agent', ua)
response = urllib2.urlopen(req)
try:
    html = response.read()
except urllib2.URLError, e:
    print "Error while reading data. Are you connected to the interwebz?!", e

soup = BeautifulSoup.BeautifulSoup(html)
form = soup.find('form', id='form_product_page')
pret = form.prettify()

print pret

结果：

<form id="form_product_page" name="form_1362737440" action="/download/791055/164084/" method="get">
<input id="nojssubmit" type="submit" value="Download" />
</form>

确实，代码已经完成，正是我需要的开始。现在，我想知道我应该以哪种方式提取＆＃34;动作＆＃34;属性来自＆＃34;形式＆＃34;标签。这只是我从BeautifulSoup回应中所需要的。

我已尝试使用form = soup.find('form', id='form_product_page').parent.get('action')，但结果为“无”＃39;。我想要提取的是例如＆＃34; / download / 791055/164084 /＆＃34;。链接中的每个URL都有所不同。

<小时/> 变量（例子）：
tl2 = http://example.com
ua = Mozilla Firefox / 14.04

Answer 1

您可以一步完成：

action = soup.find('form', id='form_product_page').get('action')

从BeautifulSoup结果中获取表单“动作”

1 个答案: