我想获得“S-1”之后的链接,而不是“S-1 / A”之后的链接。我试过“.find_all(lambda tag:tag.name =='td'和tag.get()== ['S-1'])”,试过“.select('td.s-1')”,并没有得到链接。我很感激任何帮助。
以下是相关的网页来源:
<tr>
<td>ADVANCE FINANCIAL BANCORP</td>
<td>S-1/A</td>
<td>10/31/1996</td>
<td><a id="two_column_main_content_rpt_filings_fil_view_0" href="/markets/ipos/filing.ashx?filingid=1567309" target="_blank">Filing</a>
</td>
</tr>
<tr>
<td>ADVANCE FINANCIAL BANCORP</td>
<td>S-1</td>
<td>9/27/1996</td>
<td><a id="two_column_main_content_rpt_filings_fil_view_1" href="/markets/ipos/filing.ashx?filingid=921318" target="_blank">Filing</a>
</td>
</tr>
以下是相关页面来源的屏幕截图:
以下是整页来源的链接:
https://www.nasdaq.com/markets/ipos/company/advance-financial-bancorp-5492-13046?tab=financials
答案 0 :(得分:1)
试试这个:
from bs4 import BeautifulSoup
import requests
def getlink(url):
response = requests.get(url)
mainpage = BeautifulSoup(response.text, 'html5lib')
table = mainpage.findAll('table', attrs={"class": "marginB10px"})
links = table[1].findAll('a')
return links[1].get('href')
link = getlink('https://www.nasdaq.com/markets/ipos/company/advance-financial-bancorp-5492-13046?tab=financials')
mainlink = 'https://www.nasdaq.com'
link = mainlink + link
print(link)
输出:
https://www.nasdaq.com/markets/ipos/filing.ashx?filingid=921318