值误差:在Google学术搜索中解析数据时读取已关闭文件

时间:2017-06-26 22:55:53

标签: python web-scraping jupyter google-scholar

我是一名学习生物学的非学生,但我正在研究python数据科学,以便进行网络学习Google学术搜索。我创建了一个最初工作的程序,但它以某种方式随机停止工作并给了我一个值Error。我认为这可能与谷歌严格搜索他们的网站的机器人有关。任何建议和补救措施都会有所帮助!我正在使用Jupyter Notebook ipython和Python3。

代码:

import pip    
def install(package):
    pip.main(['install', package])

install('BeautifulSoup4')

from bs4 import BeautifulSoup
import urllib.request
from urllib.request import FancyURLopener

class AppURLopener(urllib.request.FancyURLopener):
    version = "Mozilla/5.0"

def page_citations(x):
    #number of pages of google searches that you want to run

    query = input()
    query = str(query)
    opener = AppURLopener()
    m = 0
    q = 0
    l = make_array()
    while m < x:
        response = 
        opener.open('https://scholar.google.com/scholar?
        start='+str(q)+'&q=' + query + '&hl=en&as_sdt=0,5').read()
        soup = BeautifulSoup(response, 'html.parser')
        for word in str(soup.find_all(class_ = "gs_fl")).split():
            if word.endswith(''+ '</a>'): 
                l = np.append(l, word.strip('</a>'))
        q = q + 10
        m = m + 1
    n = make_array()

    for number in l:
        try:
            number = int(number)
            n = np.append(n, number)
        except: continue

    return n

错误: ValueError:读取已关闭的文件

0 个答案:

没有答案