我有一些通过python和selenium进行的解析工作。这是我在HTML源代码中的链接:
<аclass =“ NQWMenuItem” name =“ SectionElements” href =“ javascript:void(null);” 的onclick =“NQWClearActiveMenu();?下载('saw.dll转到&_scid = RQqdowdFKUY&ViewID = d \ x253adashboard \ x257ep \ x253a6umggrpo8urqvbmv \ x257er \ x253a67dmsf5fpr8csc50&动作=下载&SearchID = hmd09g8fe17dagu1l8l463e856&PortalPath = /共享/ \ x25d0 \ x25a1 \ x25d0 \ x25b5 \ x25d1 \ x2580 \ x25d0 \ x25b2 \ x25d0 \ x25b8 \ x25d1 \ x2581 / _portal / \ x25d0 \ x25a1 \ x25d0 \ x25b5 \ x25d1 \ x2580 \ x25d0 \ x25b2 \ x25d0 \ x25b8 \ x25d1 \ x2581 \ x2520- \ 25 x259e \ x2520 \ x25d0 \ x25b8 \ x2520 \ x25d0 \ x2597 \ x25d0 \ x25bd \ x25d0 \ x25a0&Page = \ x25d0 \ x2597 \ x25d0 \ x259e \ x2520 \ x25d0 \ x25b7 \ x25d0 \ x25b0 \ x25 \ x2587 \ x25d0 \ x25b0 \ x2520 \ x25d0 \ x2597 \ x25d0 \ x25bd \ x25d0 \ x25a0&ViewState = 4e0eaq3qdoiuvg7v7e2ke0u78i&ItemName = \ x25d0 \ x25bf \ x25d1 \ x2580 \ x25x0x1xx2580 \ x25x0 x25b0 \ x25d0 \ x25b2 \ x25d0 \ x25bb \ x25d0 \ x25b5 \ x25d0 \ x25bd \ x25d0 \ x25b8 \ x25d0 \ x25b5 \ x253a \ x2520 \ x25d0 \ x2597 \ x25d0 \ x259e \ x2520 \ x25 \ 0 \ 25 \ x25 x25b4 \ x25d0 \ x25b0 \ x25d1 \ x2587 \ x25d0 \ x25b0 \ x2520 \ x25d0 \ x2597 \ x25d0 \ x25bd \ x25d0 \ x25a0&Format = excel2000&E xtension = .xls'); 返回false“ style =”“> Excel 2000中的注释
我得到了onclick str(这是我需要的文档的URL),但是有俄语字符编码:\ x25b0,\ x25d0,\ x25b5等。
当我在浏览器中单击此链接时,URL将为:
http://ld3ap03.htsk.ru:7777/analytics/saw.dll?Go&_scid=RQqdowdFKUY&ViewID=d:dashboard~p:6umggrpo8urqvbmv~r:67dmsf5fpr8csc50&Action=Download&SearchID=hmd09g8fe17dagu1l8l463e856&PortalPath=/shared/Сервис/_portal/Сервис -ЗОиЗнР&Page =ЗОзадачаЗнР&ViewState = 4e0eaq3qdoiuvg7v7e2ke0u78i&ItemName =представление:ЗО задачаЗнР&Format = excel2000&Extension = .xls
如您所见,没有\ x编码的字符。
什么是\ x编码?如何获得正确的URL?我使用Python。
答案 0 :(得分:0)
似乎是HTML条目。解决方案是:
urllib.parse.unquote(html.unescape(my_url))