如何使用python解码html代码

时间:2017-05-29 14:23:14

标签: python html decode

这是来自服务器的get请求提供的代码。我可以看到它是使用gzip编码的,现在我需要一个干净的HTML代码。

示例:

{"Html":"\u003cdiv class=\"panel-body\"\u003e\r\n        \u003cdiv class=\"printContainer\"\u003e\r\n            \u003ca href=\"#\" data-itemid=\"237994\" data-toggle=\"tooltip\" data-placement=\"top\" data-action=\"printProductInfo\" accesskey=\"p\" rel=\"nofollow\"\u003e\r\n                \u003cspan class=\"nonIcon printIcon\"\u003e\u003c/span\u003e\u0026nbsp;Skriv ut\r\n            \u003c/a\u003e\r\n        \u003c/div\u003e\r\n    \u003ctable class=\"table table-striped\"\u003e\r\n                        \u003ctr class=\"info\"\u003e\r\n                    \u003cth colspan=\"2\" class=\"plCategoryHeader\" style=\"margin-top: 0\"\u003e\r\n                        Allm\u0026#228;nt\r\n                    \u003c/th\u003e\r\n                \u003c/tr\u003e\r\n                \u003ctr\u003e\r\n                    \u003ctd class=\"plHeader\"\u003e\r\n                        Tillv. art. nr.\r\n                    \u003c/td\u003e\r\n                    \u003ctd class=\"plValue\"\u003e\r\n                        T3M78AA#ABB\r\n                    \u003c/td\u003e\r\n                \u003c/tr\u003e\r\n                \u003ctr\u003e\r\n                    \u003ctd class=\"plHeader\"\u003e\r\n

1 个答案:

答案 0 :(得分:-2)

response = '{"Html":"\u003cdiv class=\"panel-body\"\u003e\r\n        \u003cdiv class=\"printContainer\"\u003e\r\n            \u003ca href=\"#\" data-itemid=\"237994\" data-toggle=\"tooltip\" data-placement=\"top\" data-action=\"printProductInfo\" accesskey=\"p\" rel=\"nofollow\"\u003e\r\n                \u003cspan class=\"nonIcon printIcon\"\u003e\u003c/span\u003e\u0026nbsp;Skriv ut\r\n            \u003c/a\u003e\r\n        \u003c/div\u003e\r\n    \u003ctable class=\"table table-striped\"\u003e\r\n                        \u003ctr class=\"info\"\u003e\r\n                    \u003cth colspan=\"2\" class=\"plCategoryHeader\" style=\"margin-top: 0\"\u003e\r\n                        Allm\u0026#228;nt\r\n                    \u003c/th\u003e\r\n                \u003c/tr\u003e\r\n                \u003ctr\u003e\r\n                    \u003ctd class=\"plHeader\"\u003e\r\n                        Tillv. art. nr.\r\n                    \u003c/td\u003e\r\n                    \u003ctd class=\"plValue\"\u003e\r\n                        T3M78AA#ABB\r\n                    \u003c/td\u003e\r\n                \u003c/tr\u003e\r\n                \u003ctr\u003e\r\n                    \u003ctd class=\"plHeader\"\u003e\r\n"}'
print(response.encode('utf-8'))
# prints: b'{"Html":"<div class="panel-body">\r\n        <div class="printContainer">\r\n            <a href="#" data-itemid="237994" data-toggle="tooltip" data-placement="top" data-action="printProductInfo" accesskey="p" rel="nofollow">\r\n                <span class="nonIcon printIcon"></span>&nbsp;Skriv ut\r\n            </a>\r\n        </div>\r\n    <table class="table table-striped">\r\n                        <tr class="info">\r\n                    <th colspan="2" class="plCategoryHeader" style="margin-top: 0">\r\n                        Allm&#228;nt\r\n                    </th>\r\n                </tr>\r\n                <tr>\r\n                    <td class="plHeader">\r\n                        Tillv. art. nr.\r\n                    </td>\r\n                    <td class="plValue">\r\n                        T3M78AA#ABB\r\n                    </td>\r\n                </tr>\r\n                <tr>\r\n                    <td class="plHeader">\r\n"}'

甚至更简单,对于这个具体的回应:

response = '{"Html":"\u003cdiv class=\"panel-body\"\u003e\r\n        \u003cdiv class=\"printContainer\"\u003e\r\n            \u003ca href=\"#\" data-itemid=\"237994\" data-toggle=\"tooltip\" data-placement=\"top\" data-action=\"printProductInfo\" accesskey=\"p\" rel=\"nofollow\"\u003e\r\n                \u003cspan class=\"nonIcon printIcon\"\u003e\u003c/span\u003e\u0026nbsp;Skriv ut\r\n            \u003c/a\u003e\r\n        \u003c/div\u003e\r\n    \u003ctable class=\"table table-striped\"\u003e\r\n                        \u003ctr class=\"info\"\u003e\r\n                    \u003cth colspan=\"2\" class=\"plCategoryHeader\" style=\"margin-top: 0\"\u003e\r\n                        Allm\u0026#228;nt\r\n                    \u003c/th\u003e\r\n                \u003c/tr\u003e\r\n                \u003ctr\u003e\r\n                    \u003ctd class=\"plHeader\"\u003e\r\n                        Tillv. art. nr.\r\n                    \u003c/td\u003e\r\n                    \u003ctd class=\"plValue\"\u003e\r\n                        T3M78AA#ABB\r\n                    \u003c/td\u003e\r\n                \u003c/tr\u003e\r\n                \u003ctr\u003e\r\n                    \u003ctd class=\"plHeader\"\u003e\r\n"}'
print(response[9:-2])

第二种方式也会打印所有换行符,因此您有一个有效的,格式良好的HTML。