我需要提取网站中的链接数量,例如 https://stackoverflow.com/questions/ask(仅作为示例)
我试过用urlparse
来提取url信息,然后是美汤。
domain_name = urlparse(url).netloc
soup = BeautifulSoup(requests.get(url).content, "html.parser")
我需要将网站中每个网站的所有链接保存在一个列表中。我想要这样的东西:
URL Links
https://stackoverflow.com/questions/ask ['link1','link2','link3',...]
https://anotherwebsite.com/sport ['link1','link2','link3','link4']
https://last_example.es []
你能解释一下如何得到类似的结果吗?
答案 0 :(得分:2)
让我们试试:
plot_ly(data=df.data, x = ~date, y = ~variable, z = ~value, type="scatter3d", mode='lines', split=~variable, color=I('black')) %>% layout(showlegend = FALSE)
输出:
def get_all_links(url):
# of course one needs to deal with the case when `requests` fails
# but that's outside the scope here
soup = BeautifulSoup(requests.get(url).content, "html.parser")
return [a.attrs.get('href', '') for a in soup.find_all('a')]
# sample data
df = pd.DataFrame({'URL':['https://stackoverflow.com/questions/ask']})
df['Links'] = df['URL'].apply(get_all_links)