URL编码\ x个字符

时间:2018-09-15 14:51:09

标签: url character-encoding

我有一些通过python和selenium进行的解析工作。这是我在HTML源代码中的链接:

  

<аclass =“ NQWMenuItem” name =“ SectionElements”   href =“ javascript:void(null);”   的onclick =“NQWClearActiveMenu();?下载('saw.dll转到&_scid = RQqdowdFKUY&ViewID = d \ x253adashboard \ x257ep \ x253a6umggrpo8urqvbmv \ x257er \ x253a67dmsf5fpr8csc50&动作=下载&SearchID = hmd09g8fe17dagu1l8l463e856&PortalPath = /共享/ \ x25d0 \ x25a1 \ x25d0 \ x25b5 \ x25d1 \ x2580 \ x25d0 \ x25b2 \ x25d0 \ x25b8 \ x25d1 \ x2581 / _portal / \ x25d0 \ x25a1 \ x25d0 \ x25b5 \ x25d1 \ x2580 \ x25d0 \ x25b2 \ x25d0 \ x25b8 \ x25d1 \ x2581 \ x2520- \ 25 x259e \ x2520 \ x25d0 \ x25b8 \ x2520 \ x25d0 \ x2597 \ x25d0 \ x25bd \ x25d0 \ x25a0&Page = \ x25d0 \ x2597 \ x25d0 \ x259e \ x2520 \ x25d0 \ x25b7 \ x25d0 \ x25b0 \ x25 \ x2587 \ x25d0 \ x25b0 \ x2520 \ x25d0 \ x2597 \ x25d0 \ x25bd \ x25d0 \ x25a0&ViewState = 4e0eaq3qdoiuvg7v7e2ke0u78i&ItemName = \ x25d0 \ x25bf \ x25d1 \ x2580 \ x25x0x1xx2580 \ x25x0 x25b0 \ x25d0 \ x25b2 \ x25d0 \ x25bb \ x25d0 \ x25b5 \ x25d0 \ x25bd \ x25d0 \ x25b8 \ x25d0 \ x25b5 \ x253a \ x2520 \ x25d0 \ x2597 \ x25d0 \ x259e \ x2520 \ x25 \ 0 \ 25 \ x25 x25b4 \ x25d0 \ x25b0 \ x25d1 \ x2587 \ x25d0 \ x25b0 \ x2520 \ x25d0 \ x2597 \ x25d0 \ x25bd \ x25d0 \ x25a0&Format = excel2000&E xtension = .xls');   返回false“ style =”“> Excel 2000中的注释

我得到了onclick str(这是我需要的文档的URL),但是有俄语字符编码:\ x25b0,\ x25d0,\ x25b5等。

当我在浏览器中单击此链接时,URL将为:

  

http://ld3ap03.htsk.ru:7777/analytics/saw.dll?Go&_scid=RQqdowdFKUY&ViewID=d:dashboard~p:6umggrpo8urqvbmv~r:67dmsf5fpr8csc50&Action=Download&SearchID=hmd09g8fe17dagu1l8l463e856&PortalPath=/shared/Сервис/_portal/Сервис   -ЗОиЗнР&Page =ЗОзадачаЗнР&ViewState = 4e0eaq3qdoiuvg7v7e2ke0u78i&ItemName =представление:ЗО   задачаЗнР&Format = excel2000&Extension = .xls

如您所见,没有\ x编码的字符。

什么是\ x编码?如何获得正确的URL?我使用Python。

1 个答案:

答案 0 :(得分:0)

似乎是HTML条目。解决方案是:

urllib.parse.unquote(html.unescape(my_url))

在此处描述:Decode HTML entities in Python string?