我正在尝试在此 td 标签
中获取下载URL。<a href="https://dibbs2.bsm.dla.mil/Downloads/Awards/18SEP19/GS07F5933RSPEFA519F0433.PDF" target="DIBBSDocuments" title="Link To Delivery Order Document"><img alt="PDF Document" border="0" height="16" hspace="2" src="https://www.dibbs.bsm.dla.mil/app_themes/images/icons/IconPdf.gif" width="16"/></a>, <a href="https://dibbs2.bsm.dla.mil/Downloads/Awards/18SEP19/GS07F5933RSPEFA519F0433.PDF" target="DIBBSDocuments" title="Link To Delivery Order Document">SPEFA519F0433</a>
上面的输出是由我的代码产生的:
downloandurl=batch.select("a[href*=https://dibbs2.bsm.dla.mil/Downloads/Awards/]")
如何从代码
获取 href 网址我正在尝试检索
https://dibbs2.bsm.dla.mil/Downloads/Awards/18SEP19/GS07F5933RSPEFA519F0433.PDF
答案 0 :(得分:0)
请为您的问题使用适当的标签,并共享您的代码,这样我们就知道您做了多少,而不是提供完整的答案。谢谢,
尝试一下:
city of seattle
a city of seattle
答案 1 :(得分:0)
要从定位标记获取href
值。
使用
OR
OR
from bs4 import BeautifulSoup
data='''<a href="https://dibbs2.bsm.dla.mil/Downloads/Awards/18SEP19/GS07F5933RSPEFA519F0433.PDF" target="DIBBSDocuments" title="Link To Delivery Order Document"><img alt="PDF Document" border="0" height="16" hspace="2" src="https://www.dibbs.bsm.dla.mil/app_themes/images/icons/IconPdf.gif" width="16"/></a>, <a href="https://dibbs2.bsm.dla.mil/Downloads/Awards/18SEP19/GS07F5933RSPEFA519F0433.PDF" target="DIBBSDocuments" title="Link To Delivery Order Document">SPEFA519F0433</a>'''
soup=BeautifulSoup(data,'html.parser')
for item in soup.select('a'):
print(item['href'])
print(item.get('href'))
print(item.attrs.get('href'))
如果您要照顾一些特定的锚标签,则在find标签中添加更多条件,例如。
for item in soup.select('a[target="DIBBSDocuments"]'):
print(item['href'])
print(item.get('href'))
print(item.attrs.get('href'))
OR开头-href网址。
for item in soup.select('a[href^="https://dibbs2.bsm.dla.mil/Downloads/Awards"]'):
print(item['href'])
print(item.get('href'))
print(item.attrs.get('href'))