Question

我正在尝试构建一个url，以便我可以使用urllib模块向它发送get请求。

我们假设我的final_url应该是

url = "www.example.com/find.php?data=http%3A%2F%2Fwww.stackoverflow.com&search=Generate+value"

现在为了达到这个目的，我尝试了以下方法：

>>> initial_url = "http://www.stackoverflow.com"
>>> search = "Generate+value"
>>> params = {"data":initial_url,"search":search}
>>> query_string = urllib.urlencode(params)
>>> query_string
'search=Generate%2Bvalue&data=http%3A%2F%2Fwww.stackoverflow.com'

现在，如果您将我的query_string与final_url的格式进行比较，您可以观察两件事

1）params的顺序颠倒而不是data=()&search= search=()&data=

2）urlencode还对+

中的Generate+value进行了编码

我认为第一个变化是由于字典的随机行为。所以，我虽然使用OrderedDict to reverse the dictionary。因为，我正在使用python 2.6.5我做了

pip install ordereddict

但是当我尝试

时，我无法在我的代码中使用它

>>> od = OrderedDict((('a', 'first'), ('b', 'second')))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'OrderedDict' is not defined

所以，我的问题是在python 2.6.5中使用OrderedDict的正确方法是什么，以及如何使urlencode忽略+中的Generate+value。< / p>

此外，这是构建URL的正确方法。

Answer 1

您不应该担心编码+在对网址进行转义后应该在服务器上恢复它。命名参数的顺序也无关紧要。

考虑到OrderedDict，它不是内置的Python。你应该从collections导入它：

from urllib import urlencode, quote
# from urllib.parse import urlencode # python3
from collections import OrderedDict

initial_url = "http://www.stackoverflow.com"
search = "Generate+value"
query_string = urlencode(OrderedDict(data=initial_url,search=search))
url = 'www.example.com/find.php?' + query_string

如果您的python太旧且模块collections中没有OrderedDict，请使用：

encoded = "&".join( "%s=%s" % (key, quote(parameters[key], safe="+")) 
    for key in ordered(parameters.keys()))

无论如何，参数的顺序无关紧要。

请注意safe的{{1}}参数。它会阻止quote转义，但这意味着，服务器会将+解释为Generate+value。您可以通过编写Generate value并将+标记为安全字符来手动转义%2B：

Answer 2

首先，http请求中的参数顺序应该完全不相关。如果不是，则其他方面的解析库出错了。

其次，+当然是编码的。 +用作编码网址中空格的占位符，因此如果原始字符串包含+，则必须对其进行转义。 urlencode期望一个未编码的字符串，你不能将它传递给已编码的字符串。

Answer 3

对问题和其他答案的一些评论：

如果您想使用urllib.urlencode保留订单，请提交有序的k / v对序列，而不是映射（dict）。传入词典时，urlencode只需调用foo.items()即可获取可迭代序列。

# urllib.urlencode accepts a mapping or sequence # the output of this can vary, because `items()` is called on the dict urllib.urlencode({"data": initial_url,"search": search}) # the output of this will not vary urllib.urlencode((("data", initial_url), ("search", search)))

您还可以传入secondard doseq参数来调整可迭代值的处理方式。

参数的顺序并不相关。以这两个网址为例：

https://example.com?foo=bar&bar=foo https://example.com?bar=foo&foo=bar

http服务器应该考虑这些参数的顺序无关紧要，但设计用于比较URL的函数不会。为了安全地比较网址，需要对这些参数进行排序。

但是，请考虑重复键：

https://example.com?foo=3&foo=2&foo=1

URI规范支持重复密钥，但不能解决优先级或排序问题。

在给定的应用程序中，这些可能会触发不同的结果并且也是有效的：

https://example.com?foo=1&foo=2&foo=3
https://example.com?foo=1&foo=3&foo=2
https://example.com?foo=2&foo=3&foo=1
https://example.com?foo=2&foo=1&foo=3
https://example.com?foo=3&foo=1&foo=2
https://example.com?foo=3&foo=2&foo=1

+是一个保留字符，代表urlencoded格式的空格（对于部分路径，vs %20）。 urllib.urlencode使用urllib.quote_plus()而不是urllib.quote()进行转义。 OP很可能只想这样做：

initial_url = "http://www.stackoverflow.com" search = "Generate value" urllib.urlencode((("data", initial_url), ("search", search)))

产生：

data=http%3A%2F%2Fwww.stackoverflow.com&search=Generate+value

作为输出。

使用urlencode python构建查询字符串

3 个答案: