回复:从API中提取内容,中途停止

时间:2016-10-16 03:06:39

标签: json python-2.7 wikipedia-api

我遇到了一个问题和我写的这个脚本。一个月前它完美无缺。但昨晚我尝试了它,它在得到10-15篇文章之后就停止了。我不知道为什么。

我的代码如下:

def getContents(keyword):

    f1 = open('NoRepeatedWords.text', 'r')
    line1 = f1.read().splitlines()
    f3 = open(filename, 'w+')
    count = 0
    for line in line1:
            try:

                print('======'+line + '======')
                count = count + 1
                print( 'The count is: ' + str(count))
                #to get the contents from the URL

                url ='https://en.wikipedia.org/w/api.php?action=query&format=json&prop=extracts&titles='+urllib2.quote(line)+'&explaintext=1&exsectionformat=wiki'

                # print the article title 
                f3.write('=='+ ' '+ line.strip()+ ' ' +'==' + "\n"+ "\n")

                #category itself  
                opener = urllib2.build_opener()
                opener.addheaders = [('User-agent', 'Mozilla/5.0')]  
                json_obj = opener.open(url)
                data1 = json.load(json_obj)

                for i in data1['query']['pages']:



                    f3.write((data1['query']['pages'][i]['extract']).encode('utf8')+"\n"+"\n"+"\n")




            except Exception:

                pass

    f1.close()
    f3.close()

此代码的作用仅仅是根据NoRepeatedWords.text中的关键字提取api的内容。这些关键字是维基百科的实际文章标题。

只是一个示例将其中一个关键字放入“KEY WORD HERE”文本中,它将以JSON格式显示信息:

https://en.wikipedia.org/w/api.php?action=query&format=json&prop=extracts&titles=KEYWORD HERE& explaintext = 1& exsectionformat = wiki

是否可能是因为服务器超时或连接突然终止?

我的关键字如下:

Outline of political science
Politics
Body politic
Criminalization
Criticism of science
Crossing the floor
Death in office
Extrajudicial killing
Index of gun politics articles
Index of politics articles
Island Chain Strategy
Judaism and political radicalism
Legislation
Libertarian anarchism
Monarch
Political movement
New public management
North Atlantic or liberal model of media and politics
Outpost (civilian)
Pacification theory
Pandering (politics)
Parliamentary leader
Parliamentwatch
Participation inequality
Polarization (politics)
Political abuse
Political argument
Political climate
Political crisis
Political demography
Political faction
Political globalization
Politically exposed person
Politicized issue
Politics of outer space
Politics of the International Space Station
Postfeminism
Proto-fascism
Outline of public affairs
Public interest
Public opinion
Public speaking
Publics
Punk ideologies
Puppet ruler
Regional autonomy
Sexualization
Spin (propaganda)
Spin room
Technoculture
Term of office
Testing the waters
Transparency report
Tribal chief
Two-step flow of communication
Veto
World domination
Affair
Apoliticism
The arts and politics
Bisexual politics
Corporate welfare
Courtesy resolution
Political crime
Denialism
Haldane principle
Health policy
Modified Scheme of Elementary education 1953
Issue voting
National language
Policy
Policy studies
Politicization of science
Power vacuum
Redistribution of income and wealth
Science policy
Single-issue politics
Social issue
Politics and technology
Transparency (behavior)
Urban politics
Workplace politics
Nicholas Young (MTPD)
Political lists
Lists of active separatist movements
List of Chinese spy cases in the United States
List of countries without political parties
List of coups d'état and coup attempts
List of coups d'état and coup attempts by country
List of coups d'état and coup attempts since 2010
List of cults of personality
List of foreign ministry headquarters
List of genocides by death toll
List of political conspiracies
List of countries by consultation on rule-making
List of highest paid mayors
List of historical separatist movements
List of kingdoms and royal dynasties
List of micronations
List of overseas visits by Tenzin Gyatso the 14th Dalai Lama outside India
List of peasant revolts
List of political movements named after dates
Newspaper endorsements in the United States presidential election, 1900
Newspaper endorsements in the United States presidential election, 1904
Newspaper endorsements in the United States presidential election, 2012
Newspaper endorsements in the United States presidential election, 2016
List of Occupy movement protest locations
List of people declared persona non grata
List of political catchphrases
List of political dissidents
Political gaffe
Political ideas in science fiction
List of revolutions and rebellions
List of scandals with "-gate" suffix
List of political party symbols
Table of voting systems by country
Timeline of the South China Sea dispute
List of United States presidential electors, 1792
List of United States presidential electors, 1796
List of wars between democracies
List of wars by death toll
Activism
9/11 Truth movement
1984 Network Liberty Alliance
Ableism
An Act of Conscience
Acting Witan of Mercia
Activism at Ohio Wesleyan University
Activism industry
Activist ageing
Activist knowledge
AirportWatch
Anarcho-punk
Anatopia
Anti-Capitalist Convergence
Anti-schooling activism
Antimilitarism
Artivism
Artivist Film Festival & Awards
Asia Catalyst
Außerparlamentarische Opposition
Australian Young Greens
Autistic Self Advocacy Network
Avaaz
Babels
Back-story (production)
Blackout Day
Bolivarian Revolution
Brights movement
Buycott.com
Will Byrne
Cacerolazo
CAVE People
CEC European Managers
Center for Socialist Studies
Chengara struggle
Choice USA
The Cigarette Papers
Citizen's Charter and Grievance Redressal Bill 2011
Civil libertarianism
Civil society campaign
Clandestine Insurgent Rebel Clown Army
Cognitive activism
Community Front in Defense of Land
Community House (Salt River, Cape Town)
Consciousness raising
Constructive Program
Cordobazo
Corporate Watch
Counter-recruitment
Counterculture of the 1960s
CountyWatch
Craftivist Collective

可在此处获取完整的关键字列表(共有15,000个关键字),这些字词是有效的维基百科文章标题:https://drive.google.com/file/d/0B1OJNca33pJXRGF5b1V1amcwMmM/view?usp=sharing

感谢您的帮助

0 个答案:

没有答案