我想使用BeautifulSoup搜索谷歌并打开第一个链接。但是,当我打开链接时,它显示错误。我认为是因为谷歌没有提供网站的确切链接,它在网址中添加了几个参数。如何获得确切的网址?
当我尝试使用引用标记时,它可以正常工作,但是对于大网址而言,它会产生问题。
我使用soup.h3.a [' href'] [7:]的第一个链接是: ' http://www.wikipedia.com/wiki/White_holes&sa=U&ved=0ahUKEwi_oYLLm_rUAhWJNI8KHa5SClsQFggbMAI&usg=AFQjCNGN-vlBvbJ9OPrnq40d0_b8M0KFJQ'
这是我的代码:
import requests
from bs4 import Beautifulsoup
r = requests.get('https://www.google.com/search?q=site:wikipedia.com+Black+hole&gbv=1&sei=YwHNVpHLOYiWmQHk3K24Cw')
soup = BeautifulSoup(r.text, "html.parser")
print(soup.h3.a['href'][7:])
答案 0 :(得分:1)
您可以拆分返回的字符串:
url = soup.h3.a['href'][7:].split('&')
print(url[0])
答案 1 :(得分:0)
from bs4 import BeautifulSoup
import requests
import csv
import os
import time
url = "https://www.google.co.in/search?q=site:wikipedia.com+Black+hole&dcr=0&gbv=2&sei=Nr3rWfLXMIuGvQT9xZOgCA"
r = requests.get(url)
data = r.text
url1 = "https://www.google.co.in"
soup = BeautifulSoup(data, "html.parser")
get_details = soup.find_all("div", attrs={"class":"g"})
final_data = []
for details in get_details:
link = details.find_all("h3")
#links = ""
for mdetails in link:
links = mdetails.find_all("a")
lmk = ""
for lnk in links:
lmk = lnk.get("href")[7:].split("&")
sublist = []
sublist.append(lmk[0])
final_data.append(sublist)
filename = "Google.csv"
with open("./"+filename, "w")as csvfile:
csvfile = csv.writer(csvfile, delimiter=",")
csvfile.writerow("")
for i in range(0, len(final_data)):
csvfile.writerow(final_data[i])