我编写了一个简单的脚本来访问JSON,以获取用于URL的关键字。
以下是我写的脚本:
import urllib2
import json
f1 = open('CatList.text', 'r')
f2 = open('SubList.text', 'w')
lines = f1.read().splitlines()
for line in lines:
url ='https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle='+line+'&cmlimit=100'
json_obj = urllib2.urlopen(url)
data = json.load(json_obj)
for item in data['query']:
for i in data['query']['categorymembers']:
print i['title']
print '-----------------------------------------'
f2.write((i['title']).encode('utf8')+"\n")
在此脚本中,程序将首先读取CatList,它提供用于URL的关键字列表。
以下是CatList.text包含的示例。
Category:Branches of geography
Category:Geography by place
Category:Geography awards and competitions
Category:Geography conferences
Category:Geography education
Category:Environmental studies
Category:Exploration
Category:Geocodes
Category:Geographers
Category:Geographical zones
Category:Geopolitical corridors
Category:History of geography
Category:Land systems
Category:Landscape
Category:Geography-related lists
Category:Lists of countries by geography
Category:Navigation
Category:Geography organizations
Category:Places
Category:Geographical regions
Category:Surveying
Category:Geographical technology
Category:Geography terminology
Category:Works about geography
Category:Geographic images
Category:Geography stubs
我的程序获取关键字并将其放在网址中。
但是我无法得到结果。我已通过打印URL检查了代码:
import urllib2
import json
f1 = open('CatList.text', 'r')
f2 = open('SubList2.text', 'w')
lines = f1.read().splitlines()
for line in lines:
url ='https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle='+line+'&cmlimit=100'
json_obj = urllib2.urlopen(url)
data = json.load(json_obj)
f2.write(url+'\n')
我得到的结果如下:sublist2:
https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=Category:Branches of geography&cmlimit=100
https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=Category:Geography by place&cmlimit=100
https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=Category:Geography awards and competitions&cmlimit=100
https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=Category:Geography conferences&cmlimit=100
https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=Category:Geography education&cmlimit=100
https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=Category:Environmental studies&cmlimit=100
https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=Category:Exploration&cmlimit=100
https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=Category:Geocodes&cmlimit=100
https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=Category:Geographers&cmlimit=100
https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=Category:Geographical zones&cmlimit=100
https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=Category:Geopolitical corridors&cmlimit=100
https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=Category:History of geography&cmlimit=100
https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=Category:Land systems&cmlimit=100
https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=Category:Landscape&cmlimit=100
https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=Category:Geography-related lists&cmlimit=100
https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=Category:Lists of countries by geography&cmlimit=100
https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=Category:Navigation&cmlimit=100
https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=Category:Geography organizations&cmlimit=100
https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=Category:Places&cmlimit=100
https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=Category:Geographical regions&cmlimit=100
https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=Category:Surveying&cmlimit=100
https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=Category:Geographical technology&cmlimit=100
https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=Category:Geography terminology&cmlimit=100
https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=Category:Works about geography&cmlimit=100
https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=Category:Geographic images&cmlimit=100
https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=Category:Geography stubs&cmlimit=100
它显示URL已正确放置。
但是当我运行完整的代码时,它无法获得正确的结果。
我注意到的一件事是当我放在地址栏的链接中时,例如:
https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=Category:Branches of geography&cmlimit=100
它提供了正确的结果,因为地址栏会自动将其更正为:
https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=Category:Branches%20of%20geography&cmlimit=100
我相信如果在“类别:地理分支”一词之间添加%20代替空格,我的脚本将能够获得正确的JSON项目。
问题: 但我不知道如何在上面的代码中修改此语句以获取用%20替换CatList中包含的空格。
请原谅我的格式错误和长篇文章,我还在尝试学习python。
感谢您帮助我。
修改
谢谢蒂姆。您的解决方案有效: url ='https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle='+urllib2.quote(line)+'&cmlimit=100'
能够打印出正确的结果:
https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=Category%3ABranches%20of%20geography&cmlimit=100
https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=Category%3AGeography%20by%20place&cmlimit=100
https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=Category%3AGeography%20awards%20and%20competitions&cmlimit=100
https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=Category%3AGeography%20conferences&cmlimit=100
https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=Category%3AGeography%20education&cmlimit=100
https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=Category%3AEnvironmental%20studies&cmlimit=100
https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=Category%3AExploration&cmlimit=100
https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=Category%3AGeocodes&cmlimit=100
https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=Category%3AGeographers&cmlimit=100
https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=Category%3AGeographical%20zones&cmlimit=100
https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=Category%3AGeopolitical%20corridors&cmlimit=100
https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=Category%3AHistory%20of%20geography&cmlimit=100
https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=Category%3ALand%20systems&cmlimit=100
https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=Category%3ALandscape&cmlimit=100
https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=Category%3AGeography-related%20lists&cmlimit=100
https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=Category%3ALists%20of%20countries%20by%20geography&cmlimit=100
https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=Category%3ANavigation&cmlimit=100
https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=Category%3AGeography%20organizations&cmlimit=100
https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=Category%3APlaces&cmlimit=100
https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=Category%3AGeographical%20regions&cmlimit=100
https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=Category%3ASurveying&cmlimit=100
https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=Category%3AGeographical%20technology&cmlimit=100
https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=Category%3AGeography%20terminology&cmlimit=100
https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=Category%3AWorks%20about%20geography&cmlimit=100
https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=Category%3AGeographic%20images&cmlimit=100
https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=Category%3AGeography%20stubs&cmlimit=100
答案 0 :(得分:1)
使用urllib.quote()
替换网址中的特殊字符:
Python 2:
import urllib
line = 'Category:Branches of geography'
url ='https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=' + urllib.quote(line) + '&cmlimit=100'
https://docs.python.org/2/library/urllib.html#urllib.quote
Python 3:
import urllib.parse
line = 'Category:Branches of geography'
url ='https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=' + urllib.parse.quote(line) + '&cmlimit=100'
https://docs.python.org/3.5/library/urllib.parse.html#urllib.parse.quote