我有以下程序,其中我试图将元素列表传递给连续的Google搜索:
search_terms = ['Telejob (ETH)', 'Luisa da Silva','The CERN Recruitment Services']
for el in search_terms:
webpage = 'http://google.com/search?q='+el)
print('xxxxxxxxxxxxxxxxxxx')
print(webpage)
不幸的是我的程序没有记录每个列表项中的所有单词,而是只取第一个单词,给我这个输出:
http://google.com/search?q=Telejob (ETH)
xxxxxxxxxxxxxxxxxxx
http://google.com/search?q=Luisa da Silva
xxxxxxxxxxxxxxxxxxx
http://google.com/search?q=The CERN Recruitment Services
xxxxxxxxxxxxxxxxxxx
http://google.com/search?q=The Swiss National Science Foundation
尽管你可以看到整个项目,每个单词都被添加到上面的搜索中,当我验证链接时,它将作为元素连接到每个项目的第一个单词,如下:
http://google.com/search?q=Telejob
xxxxxxxxxxxxxxxxxxx
http://google.com/search?q=Luisa
xxxxxxxxxxxxxxxxxxx
http://google.com/search?q=The
xxxxxxxxxxxxxxxxxxx
http://google.com/search?q=The
我做错了什么以及将每个列表项中的所有单词连接到谷歌搜索的解决方案是什么?
谢谢
答案 0 :(得分:0)
我相信你的问题在于url-encoding。
要通过'%20'
来放置网址中的空格尝试将您的链接更改为
https://www.google.com/search?q=The%20CERN%20Recruitment%20Services
答案 1 :(得分:0)
这一行:
webpage = 'http://google.com/search?q='+el)
应该拆分并与%20 joiner连接:
webpage = 'http://google.com/search?q='+'%20'.join(el.split()))
答案 2 :(得分:0)
您可以在python3中使用urllib.parse.urlencode。对于python2,您可以使用urllib.urlencode。
import urllib
search_terms = ['Telejob (ETH)', 'Luisa da Silva','The CERN Recruitment Services']
for el in search_terms:
query = urllib.parse.urlencode({'q': el}) # urllib.urlencode({'q': el})
webpage = 'http://google.com/search?{}'.format(query)
print('xxxxxxxxxxxxxxxxxxx')
print(webpage)
答案 3 :(得分:0)
这些答案都没有解决基本问题:您需要将整个字符串编码为网址。
我选择urllib.quote()
:
>>> import urllib
>>> for term in search_terms:
print urllib.quote(term)
Telejob%20%28ETH%29
Luisa%20da%20Silva
The%20CERN%20Recruitment%20Services
请注意,()
也会被编码,任何其他奇怪的字符也可能会被编码。
在你的情况下,它将是:
webpage = 'http://google.com/search?q=' + urllib.quote(el))
Py3中的等价物:
from urllib import parse
for term in search_terms:
print(parse.quote(term))
所以
webpage = 'http://google.com/search?q=' + parse.quote(el))
答案 4 :(得分:0)
问题是URL需要进行百分比编码,URL中有特殊含义的字符,例如:
#
:转到页面中的某个位置/
:我想你知道这个人做了什么...... 你应该使用quote()
来解决这个问题,并记住:
urllib.quote()
适用于Python2 url.parse.quote()
适用于Python3 以下是Python3的一些示例:
from urllib.parse import quote
quote('/bars/will/stay/intact')
#'/bars/will/stay/intact'
quote('/bars/wont/stay/intact', safe='')
#'%2Fbars%2Fwont%2Fstay%2Fintact' #Actually, everything will be encoded here
quote('()ñ´ ç')
#'%28%29%C3%B1%C2%B4%20%C3%A7'
所以你的代码现在是:
search_terms = ['Telejob (ETH)', 'Luisa da Silva','The CERN Recruitment Services']
for el in search_terms:
webpage = 'http://google.com/search?q='+quote(el)
print('xxxxxxxxxxxxxxxxxxx')
print(webpage)
由于search_terms
可能包含quote('something')
不会转义的其他字符,因此您必须使用其安全参数:
search_terms = ['Telejob (ETH)', 'Luisa da Silva','The CERN Recruitment Services']
for el in search_terms:
webpage = 'http://google.com/search?q='+quote(el, safe='')
print('xxxxxxxxxxxxxxxxxxx')
print(webpage)
最后一个,输出:
xxxxxxxxxxxxxxxxxxx
http://google.com/search?q=Telejob%20%28ETH%29
xxxxxxxxxxxxxxxxxxx
http://google.com/search?q=Luisa%20da%20Silva
xxxxxxxxxxxxxxxxxxx
http://google.com/search?q=The%20CERN%20Recruitment%20Services
我建议您查看:https://docs.python.org/3/library/urllib.parse.html#url-quoting了解更多信息(请参阅?#
字符!)
答案 5 :(得分:0)
Google查询的格式为 https://www.google.com/search?q=keyword_1+...+keyword_N ,因此您应该像这样格式化查询:
search_terms = ["Telejob (ETH)", "Luisa da Silva","The CERN Recruitment Services"]
for search_term in search_terms:
query = "+".join(search_term.split())
url = "http://google.com/search?q=" + query