我正在尝试在a href
下获得链接,并在下一个<td scope = "raw">
中获得文本
我尝试过
url = "https://www.sec.gov/Archives/edgar/data/1491829/0001171520-19-000171-index.htm"
records = []
for link in soup.find_all('a'):
Name = link.text
Links = link.get('href')
records.append((Name, Links))
但是这给了我eps8453.htm
作为文本,因为这是标签<a href>
下的文本。有什么方法可以在标签<td scope = "raw">
旁边的标签<a href>
中查找文本,即“ 10-K”
请帮助!
答案 0 :(得分:0)
在表内<td>
标记之后使用find_next <a>
标记。
import requests
from bs4 import BeautifulSoup
url = "https://www.sec.gov/Archives/edgar/data/1491829/0001171520-19-000171-index.htm"
html=requests.get(url).text
soup=BeautifulSoup(html,'html.parser')
records = []
for link in soup.find('table', class_='tableFile').find_all('a'):
Name = link.text
Links = link.get('href')
text=link.find_next('td').contents[0]
print(Name,text)
records.append((Name, Links,text))
输出:
eps8453.htm 10-K
ex31-1.htm EX-31.1
ex31-2.htm EX-31.2
ex32-1.htm EX-32.1
yu-logo.jpg GRAPHIC
yu_sig.jpg GRAPHIC
0001171520-19-000171.txt