当我运行下面显示的程序时,我遇到了与Unicode编码有关的错误
flutter: ══╡ EXCEPTION CAUGHT BY WIDGETS LIBRARY ╞═══════════════════════════════════════════════════════════
flutter: The following assertion was thrown while finalizing the widget tree:
flutter: _HistoryViewState#a8eac(ticker active but muted) was disposed with an active Ticker.
flutter: _HistoryViewState created a Ticker via its SingleTickerProviderStateMixin, but at the time dispose()
flutter: was called on the mixin, that Ticker was still active. The Ticker must be disposed before calling
flutter: super.dispose(). Tickers used by AnimationControllers should be disposed by calling dispose() on the
flutter: AnimationController itself. Otherwise, the ticker will leak.
flutter: The offending ticker was: Ticker(created by _HistoryViewState#a8eac(lifecycle state: created))
flutter: The stack trace when the Ticker was actually created was:
flutter: #0 new Ticker.<anonymous closure>
package:flutter/…/scheduler/ticker.dart:64
flutter: #1 new Ticker
package:flutter/…/scheduler/ticker.dart:66
flutter: #2 __HistoryViewState&State&SingleTickerProviderStateMixin.createTicker
package:flutter/…/widgets/ticker_provider.dart:93
flutter: #3 new AnimationController
错误在下面给出
import bs4
import requests
from xhtml2pdf import pisa # import python module
from xhtml2pdf.config.httpconfig import httpConfig
res = requests.get("https://www.insightsonindia.com/2018/06/04/insights-daily-current-affairs-04-june-2018/")
soup = bs4.BeautifulSoup(res.text, 'lxml')
pf = soup.find("div", class_="pf-content")
sourceHtml =str(pf)
outputFilename = "test.pdf"
def convertHtmlToPdf(sourceHtml, outputFilename):
# open output file for writing (truncated binary)
httpConfig.save_keys('nosslcheck', True)
resultFile = open(outputFilename, "w+b")
# convert HTML to PDF
pisaStatus = pisa.CreatePDF(sourceHtml, dest=resultFile, encoding="utf-8")
# close output file
resultFile.close() # close output file
# return True on success and False on errors
return pisaStatus.err
# Main program
if __name__ == "__main__":
pisa.showLogging()
convertHtmlToPdf(sourceHtml, outputFilename)
我正在尝试使用xhtml2pdf下载网站的一部分。为此,我使用了bs4并将其抓取并存储。然后使用xhtml2pdf将其保存为pdf。 大多数时候,它像魅力一样运作。但是对于这种情况,它给了我错误。链接到github中的完整代码如下
可以使用完整代码链接here
xhtml2pdf用ascii编码,由于我的html文件包含非ascii字符,因此显示错误。而且我不知道如何在xhtml2pdf中更改编码器。不能省略非ASCII字符。如果我忽略它,则指向图像的链接将被破坏,并且图像将不会以pdf显示。
完全追溯
```Traceback (most recent call last):
File "test3.py", line 80, in
convertHtmlToPdf(sourceHtml, outputFilename)
File "test3.py", line 68, in convertHtmlToPdf
pisaStatus = pisa.CreatePDF(sourceHtml, dest=resultFile, encoding= 'utf-8')
File "C:\Users\Ananthu\AppData\Local\Programs\Python\Python37-32\lib\site-packages\xhtml2pdf\document.py", line 97, in pisaDocument
encoding, context=context, xml_output=xml_output)
File "C:\Users\Ananthu\AppData\Local\Programs\Python\Python37-32\lib\site-packages\xhtml2pdf\document.py", line 59, in pisaStory
pisaParser(src, context, default_css, xhtml, encoding, xml_output)
File "C:\Users\Ananthu\AppData\Local\Programs\Python\Python37-32\lib\site-packages\xhtml2pdf\parser.py", line 759, in pisaParser
pisaLoop(document, context)
File "C:\Users\Ananthu\AppData\Local\Programs\Python\Python37-32\lib\site-packages\xhtml2pdf\parser.py", line 700, in pisaLoop
pisaLoop(node, context, path, **kw)
File "C:\Users\Ananthu\AppData\Local\Programs\Python\Python37-32\lib\site-packages\xhtml2pdf\parser.py", line 644, in pisaLoop
pisaLoop(nnode, context, path, **kw)
File "C:\Users\Ananthu\AppData\Local\Programs\Python\Python37-32\lib\site-packages\xhtml2pdf\parser.py", line 644, in pisaLoop
pisaLoop(nnode, context, path, **kw)
File "C:\Users\Ananthu\AppData\Local\Programs\Python\Python37-32\lib\site-packages\xhtml2pdf\parser.py", line 644, in pisaLoop
pisaLoop(nnode, context, path, **kw)
[Previous line repeated 2 more times]
File "C:\Users\Ananthu\AppData\Local\Programs\Python\Python37-32\lib\site-packages\xhtml2pdf\parser.py", line 514, in pisaLoop
attr = pisaGetAttributes(context, node.tagName, node.attributes)
File "C:\Users\Ananthu\AppData\Local\Programs\Python\Python37-32\lib\site-packages\xhtml2pdf\parser.py", line 124, in pisaGetAttributes
nv = c.getFile(nv)
File "C:\Users\Ananthu\AppData\Local\Programs\Python\Python37-32\lib\site-packages\xhtml2pdf\context.py", line 818, in getFile
return getFile(name, relative or self.pathDirectory)
File "C:\Users\Ananthu\AppData\Local\Programs\Python\Python37-32\lib\site-packages\xhtml2pdf\util.py", line 738, in getFile
file = pisaFileObject(*a, **kw)
File "C:\Users\Ananthu\AppData\Local\Programs\Python\Python37-32\lib\site-packages\xhtml2pdf\util.py", line 644, in init
conn.request("GET", path)
File "C:\Users\Ananthu\AppData\Local\Programs\Python\Python37-32\lib\http\client.py", line 1229, in request
self._send_request(method, url, body, headers, encode_chunked)
File "C:\Users\Ananthu\AppData\Local\Programs\Python\Python37-32\lib\http\client.py", line 1240, in _send_request
self.putrequest(method, url, **skips)
File "C:\Users\Ananthu\AppData\Local\Programs\Python\Python37-32\lib\http\client.py", line 1107, in putrequest
self._output(request.encode('ascii'))
UnicodeEncodeError: 'ascii' codec can't encode character '\u2019' in position 37: ordinal not in range(128)
self._output(request.encode('ascii'))
UnicodeEncodeError: 'ascii' codec can't encode character '\u2019' in position 37: ordinal not in range(128)
答案 0 :(得分:0)
问题是所检索的html包含img
标记,其某些src
属性是包含'\u2019'
(“右单引号”)字符的url。
xhtml2pdf会将这些网址传递给python的http.client模块,而无需先转义它们。 http.client尝试在检索URL之前将其编码为ASCII,然后发生错误。
这可以通过在生成pdf之前转义检索到的html中的url来解决。
urllib.parse提供了执行此操作的工具。
from urllib import parse
...
res = requests.get("https://www.insightsonindia.com/2018/06/04/insights-daily-current-affairs-04-june-2018/")
soup = bs4.BeautifulSoup(res.text, 'lxml')
pf = soup.find("div", class_="pf-content")
imgs = pf.find_all('img')
for img in imgs:
url = img['src']
scheme, netloc, path, params, query, fragment = parse.urlparse(url)
new_path = parse.quote(path)
new_url = parse.urlunparse((scheme, netloc, new_path, params, query, fragment))
img['src'] = new_url
sourceHtml =str(pf)
outputFilename = "test.pdf"
...
this question的答案提供了一些有关unicode和url的背景信息。