我尝试制作程序以从谷歌获取网址
但问题是我有编码的网址!像这样 !
`[u'http://www.motorrad-live.de/test.php%3Fid%3D11', u'http://www.autogaleria.pl/
auto_test/test.php%3Fid%3D37', u'http://oculus.ru/test.php%3Fid%3D2', u'http://o
culus.ru/test.php%3Fid%3D1', u'http://www.kerrytaylorauctions.com/detail-test.ph
p%3Fid%3D3432', u'http://radio.ghanaweb.com/live-radio.test.php?id=3D4', u'http:
//www.studygerman.ru/test/test.php%3Fid%3D261', u'http://www.mhealth.ru/tests/te
st.php%3Fid%3D300']
正如您在.php
之后看到的那样编码!
这是我的代码,即使我的代码内容部分要解码!!
import json
import urllib
def print_results(results):
mylist=[]
n=[]
for r in results:
mylist.append(r['url'])
for each in mylist:
n.append(each.replace(u"%3FID%","?id="))
print n
def query(qs):
f = urllib.urlopen('http://ajax.googleapis.com/ajax/services/search/web?v=1.0&gl=de&q=%s&rsz=8&start=7'%qs)
s = f.read()
j = json.loads(s)
return j['responseData']['results']
a=query('inurl:"test.php?id"')
print_results(a)
答案 0 :(得分:3)
您正在搜索函数unquote:
urllib.unquote(url)
答案 1 :(得分:0)
首先你需要在插入之前引用查询字符串:
>>> urllib.quote("inurl:\"test.php?id\"")
'inurl%3A%22test.php%3Fid%22'
>>> "http://ajax.googleapis.com/ajax/services/search/web?v=1.0&gl=de&q=%(q)s&rsz=8&start=0" % dict(q=urllib.quote("inurl:\"test.php?id\""))
'http://ajax.googleapis.com/ajax/services/search/web?v=1.0&gl=de&q=inurl%3A%22test.php%3Fid%22&rsz=8&start=0'
第二次我查看了返回的json,发现未编码的网址存储在密钥unescapedUrl
下,因此您可以将print_results(results)
替换为:
def print_results(results):
L=list(r['unescapedUrl'] for r in results)
print L
如果您确实需要从url
键读取它,请使用:
def print_results(results):
L=list(urllib.unquote(r['url']) for r in results)
print L