Question

很抱歉，很可能之前有人问过，但我似乎无法在堆栈/搜索引擎上找到答案。

我正在尝试从表中删除一些数据，但是我需要获得href链接。 Html如下：

<table class="featprop results">
<tr>
**1)**<td class="propname" colspan="2"><a href="/lettings-search-results?task=View&amp;itemid=136" rel="nofollow"> West Drayton</a></td>
</tr>
<tr><td class="propimg" colspan="2">

    <div class="imgcrop">
    **2)**<a href="/lettings-search-results?task=View&amp;itemid=136" rel="nofollow"><img src="content/images/1/1/641/w296/858.jpg" alt=" Ashford" width="148"/></a>


    <div class="let">&nbsp;</div>
    </div>
</td></tr>

<tr><td class="proprooms">

到目前为止，我使用了以下内容：

for table in soup.findAll('table', {'class': 'featprop results'}):
    for tr in table.findAll('tr'):
        for a in tr.findAll('a'):
            print(a)

在上面的html中返回1和2，有人可以帮我删除href链接吗？

Answer 1

for table in soup.findAll('table', {'class': 'featprop results'}):
    for tr in table.findAll('tr'):
        for a in tr.findAll('a'):
            print(a['href'])

出：

/lettings-search-results?task=View&itemid=136
/lettings-search-results?task=View&itemid=136

Attributes

编辑：

links = set() # set will remove the dupilcate
for a in tr.findAll('a', href=re.compile(r'^/lettings-search-results?')):
    links.add(a['href'])

regular expression

Answer 2

这为您提供了所选类名元素下的标签数组。

result = soup.select(".featprop a");
for a in result:
    print(a['href'])

给你以下结果：

/lettings-search-results?task=View&itemid=136
/lettings-search-results?task=View&itemid=136

在表格中获取href

2 个答案: