我遇到了一个问题,我不明白为什么会这样打印出来。
下面是我的代码,请原谅我格式错误,因为我是编程新手,这是打开一个包含大量关键字的文本文件
import urllib2
import json
f1 = open('CatList.text')
lines = f1.readlines()
for line in lines:
url ='https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle='+line+'&cmlimit=100'
print(url)
json_obj = urllib2.urlopen(url)
data = json.load(json_obj)
#to write the result
f2 = open('SubList.text', 'w')
f2.write(url)
for item in data['query']:
for i in data['query']['categorymembers']:
f2.write((i['title']).encode('utf8')+"\n")
我收到错误:
Traceback (most recent call last):
File "Test2.py", line 16, in <module>
json_obj = urllib2.urlopen(url)
File "/usr/lib/python2.7/urllib2.py", line 127, in urlopen
return _opener.open(url, data, timeout)
File "/usr/lib/python2.7/urllib2.py", line 402, in open
req = meth(req)
File "/usr/lib/python2.7/urllib2.py", line 1113, in do_request_
raise URLError('no host given')
urllib2.URLError: <urlopen error no host given>
我不确定这个错误意味着什么,但我试着打印网址。
import urllib2
import json
f1 = open('CatList.text')
f2 = open('SubList.text', 'w')
lines = f1.readlines()
for line in lines:
url ='https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle='+line+'&cmlimit=100'
print(url)
f2.write(url+'\n')
我得到的结果很奇怪(下面是结果的一部分):
https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=Category:Branches of geography
&cmlimit=100
https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=Category:Geography by place
&cmlimit=100
https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=Category:Geography awards and competitions
&cmlimit=100
https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=Category:Geography conferences
&cmlimit=100
https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=Category:Geography education
&cmlimit=100
https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=Category:Environmental studies
&cmlimit=100
https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=Category:Exploration
&cmlimit=100
https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=Category:Geocodes
&cmlimit=100
https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=Category:Geographers
&cmlimit=100
https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=Category:Geographical zones
&cmlimit=100
https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=Category:Geopolitical corridors
&cmlimit=100
https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=Category:History of geography
&cmlimit=100
https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=Category:Land systems
&cmlimit=100
https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=Category:Landscape
&cmlimit=100
https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=Category:Geography-related lists
&cmlimit=100
请注意,网址分为两部分
https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=Category:Geography-related lists
&cmlimit=100
而不是
https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=Category:Geography-related lists&cmlimit=100
我的第一个问题是我该如何解决这个问题?
其次,这是给我错误的吗?
我的CatList.text如下:
Category:Branches of geography
Category:Geography by place
Category:Geography awards and competitions
Category:Geography conferences
Category:Geography education
Category:Environmental studies
Category:Exploration
Category:Geocodes
Category:Geographers
Category:Geographical zones
Category:Geopolitical corridors
Category:History of geography
Category:Land systems
Category:Landscape
Category:Geography-related lists
Category:Lists of countries by geography
Category:Navigation
Category:Geography organizations
Category:Places
Category:Geographical regions
Category:Surveying
Category:Geographical technology
Category:Geography terminology
Category:Works about geography
Category:Geographic images
Category:Geography stubs
对不起,很长的帖子。我非常感谢你的帮助。谢谢。
答案 0 :(得分:2)
朋友,一般'\ n'用于换行。同样的意思,在文件中,每行之间都有隐藏的'\ n'字符。
所以在 lines = f1.readlines()时,它在所有行的末尾都包含'\ n'。这就是问题所在。
为避免这种情况,您应该读作 f1.read.splitlines()。
答案 1 :(得分:1)
更新以下行
url ='https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle='+line+'&cmlimit=100'
到
url ='https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle='+line.strip()+'&cmlimit=100'
您的line
包含换行符(\n
),这些字符将使用.strip()
删除,从字符串两端删除空格。