我正在尝试从Google搜索中获取链接列表:
def google_word(word):
headers={'User-agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.140 Safari/537.36 Edge/18.17763'}
url = 'https://google.com/search?q={}'.format(word)
res= requests.get(url, headers=headers)
tree= html.fromstring(res.text)
li = tree.xpath("//a[@href]") #list of links that conatin href
y = [link.get('href') for link in li if link.get('href').startswith("https://") if "google" not in link.get('href')]
现在,此代码收集了以“ https://"
开头的正确链接,我还想添加"http://"
。我需要添加到列表理解中以便做那个工作(我想一行完成)?
答案 0 :(得分:6)
将元组添加到开头
y = [link.get('href') for link in li if link.get('href').startswith(("https://", "http://")) if "google" not in link.get('href')]
答案 1 :(得分:2)
此行:
y = [link.get('href') for link in li if link.get('href').startswith("https://") if "google" not in link.get('href')]
应改为以下内容:
y = [link.get('href') for link in li if link.get('href').startswith(("https://", "http://"))]
答案 2 :(得分:1)
您可以使用正则表达式来执行此操作。方法如下:
y = [link.get('href') for link in li if re.match("https*://", link.get('href')) if "google" not in link.get('href')]
这将匹配s
的出现次数从零到无限(在实际情况下为0或1)。
答案 3 :(得分:0)
如果您正在寻找一种从Google获取搜索结果的方法,建议您使用import numpy as np
from scipy.optimize import curve_fit
import matplotlib.pyplot as plt
xdata1 = np.linspace(-9,4,20, endpoint=True) # works fine
xdata2 = xdata1+2
ydata = np.array([8,9,15,12,14,20,24,40,54,94,160,290,400,420,300,130,40,10,8,4])
def gaussian(x, amp, mean, sigma):
return amp*np.exp(-(((x-mean)**2)/(2*sigma**2)))/(sigma*np.sqrt(2*np.pi))
popt1, pcov1 = curve_fit(gaussian, xdata1, ydata)
popt2, pcov2 = curve_fit(gaussian, xdata2, ydata)
fig, ([ax1, ax2]) = plt.subplots(nrows=1, ncols=2,figsize=(9, 4))
ax1.plot(xdata1, ydata, 'b+:', label='xdata1')
ax1.plot(xdata1, gaussian(xdata1, *popt1), 'r-', label='fit')
ax1.legend()
ax2.plot(xdata2, ydata, 'b+:', label='xdata2')
ax2.plot(xdata2, gaussian(xdata2, *popt2), 'r-', label='fit')
ax2.legend()
库本身。
获取结果会容易得多。无需刮擦整个查询页面并搜索获取结果。它为您提供googlesearch
和http
链接。这是一篇可能对您有用的文章。
https://www.geeksforgeeks.org/performing-google-search-using-python-code/