我正在尝试获取特定图书缩写的前5个网址。我将参数num设置为5,我假设它将返回前5个结果,stop = 1,我将其解释为表示在返回5个结果后,将不再发送HTTP请求。出于某种原因,当我设置num = 5并且stop = 1时,我只得到3个结果,并且我得到了相同的3个搜索结果(显然应该是不同的)。此外,我在测试解决此问题时遇到HTTP错误503,尽管睡眠循环,此站点上的其他人建议将防止该错误。我的代码如下......
import random
import time
count = 0
my_file = open('sometextfile.txt','r')
for aline in my_file:
print("******************************")
print(aline)
count += 1
record_list = aline.split("\t")
if "." in record_list[1]:
search_results = google.search(record_list[2],num=5,stop=1,pause=3.)
for result in search_results:
print(result)
time.sleep(random.randrange(0,3))
并具有以下输出...
4 Environmental and Behaviour ['0143-005X']
******************************
4 Sustainable Cities and Society ['0143-005X']
******************************
4 Chicago to LA: Making sense of urban theory ['0272-4944']
******************************
4 As adopted by the International Health Conference ['0272-4944']
******************************
5 J. Wetl. ['1442-9985']
https://www.ncbi.nlm.nih.gov/nlmcatalog?term=1442-9985%5BISSN%5D
http://www.wiley.com/bw/journal.asp?ref=1442-9985
http://www.wiley.com/WileyCDA/WileyTitle/productCd-AEC.html
******************************
5 Curr. Opin. Environ. Sustain. ['1442-9985']
https://www.ncbi.nlm.nih.gov/nlmcatalog?term=1442-9985%5BISSN%5D
http://www.wiley.com/bw/journal.asp?ref=1442-9985
http://www.wiley.com/WileyCDA/WileyTitle/productCd-AEC.html
******************************
5 For. Policy Econ. ['1442-9985']
https://www.ncbi.nlm.nih.gov/nlmcatalog?term=1442-9985%5BISSN%5D
http://www.wiley.com/bw/journal.asp?ref=1442-9985
http://www.wiley.com/WileyCDA/WileyTitle/productCd-AEC.html
******************************
5 For. Policy Econ. ['1442-9985']
https://www.ncbi.nlm.nih.gov/nlmcatalog?term=1442-9985%5BISSN%5D
http://www.wiley.com/bw/journal.asp?ref=1442-9985
http://www.wiley.com/WileyCDA/WileyTitle/productCd-AEC.html
******************************
5 Asia. World Dev. ['1442-9985']
Traceback (most recent call last):
File "C:/Users/Peter/Desktop/Programming/Ibata Arens Project/google_search.py", line 27, in <module>
for result in search_results:
File "C:\Users\Peter\Anaconda3\lib\site-packages\google\__init__.py", line 304, in search
html = get_page(url)
File "C:\Users\Peter\Anaconda3\lib\site-packages\google\__init__.py", line 121, in get_page
response = urlopen(request)
File "C:\Users\Peter\Anaconda3\lib\urllib\request.py", line 163, in urlopen
return opener.open(url, data, timeout)
File "C:\Users\Peter\Anaconda3\lib\urllib\request.py", line 472, in open
response = meth(req, response)
File "C:\Users\Peter\Anaconda3\lib\urllib\request.py", line 582, in http_response
'http', request, response, code, msg, hdrs)
File "C:\Users\Peter\Anaconda3\lib\urllib\request.py", line 504, in error
result = self._call_chain(*args)
File "C:\Users\Peter\Anaconda3\lib\urllib\request.py", line 444, in _call_chain
result = func(*args)
File "C:\Users\Peter\Anaconda3\lib\urllib\request.py", line 696, in http_error_302
return self.parent.open(new, timeout=req.timeout)
File "C:\Users\Peter\Anaconda3\lib\urllib\request.py", line 472, in open
response = meth(req, response)
File "C:\Users\Peter\Anaconda3\lib\urllib\request.py", line 582, in http_response
'http', request, response, code, msg, hdrs)
File "C:\Users\Peter\Anaconda3\lib\urllib\request.py", line 510, in error
return self._call_chain(*args)
File "C:\Users\Peter\Anaconda3\lib\urllib\request.py", line 444, in _call_chain
result = func(*args)
File "C:\Users\Peter\Anaconda3\lib\urllib\request.py", line 590, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 503: Service Unavailable
我也想知道是否更好地简单地使用urllib并且通过返回的html来代替,因为我的目标只是检索每个缩写书名的ISSN。