I have a list of hyperlinks that are in the format < a href="/linkaddress" < /a> (spaces added in so that it displays).
Unfortunately, the format does not have the full address, so I would like to add the beginning of the web address by splicing two strings together. I have got some code that looks like this;
import requests
from bs4 import BeautifulSoup
r_2 = requests.get('http://www.website.com/linkaddress/')
soup = BeautifulSoup(r_2.text, 'html.parser')
links = soup.find_all('a')
links_list = []
for link in links:
links_list.append(link)
link_end = links_list[9:-4]
# select information between 9th position and 4th last position
link_start = 'http://www.website.com/'
master_links = link_start + link_end
print master_links
I have encountered a problem when trying to select just the link address from the hyperlink because it is not actually a string, it is a bs4.element.Tag. Is there a way that I can only select the link address from each entry in the list 'links_list'? Or do I have to convert it into a string?
答案 0 :(得分:1)
Actually you don't need to specify attrs
, just simply:
link['href']
attrs
is good in situations when you aren't sure if href
is presented in attributes of some tag:
if 'href' in link.attrs:
print(link['href'])
答案 1 :(得分:0)
Each node has an attribute 'attrs' that's a python dictionary containing all the attributes defined on that node.
So, the address can be retrieved as:
link.attrs['href']