Python tldextract错误读取TLD缓存文件

时间:2013-12-29 05:48:52

标签: python python-3.x tld

我正在尝试使用tldextract

提取域名
ext = tldextract.extract(editString2)
print (ext.domain)

但我同时得到此错误,无论如何要阻止此错误?我正在获取打印和结果,但只是试图找到一种不让它显示此错误的方法。

error reading TLD cache file C:\Python33\lib\site-packages\tldextract\.tld_set: 'charmap' codec can't decode byte 0x81 in position 2350: character maps to <undefined>
Exception reading Public Suffix List url https://raw.github.com/mozilla/mozilla-central/master/netwerk/dns/effective_tld_names.dat. Consider using a mirror or constructing your TLDExtract with `fetch=False`.
Traceback (most recent call last):
  File "C:\Python33\lib\site-packages\tldextract\tldextract.py", line 247, in _PublicSuffixListSource
    page = unicode(urlopen(url).read(), 'utf-8')
  File "C:\Python33\lib\urllib\request.py", line 156, in urlopen
    return opener.open(url, data, timeout)
  File "C:\Python33\lib\urllib\request.py", line 475, in open
    response = meth(req, response)
  File "C:\Python33\lib\urllib\request.py", line 587, in http_response
    'http', request, response, code, msg, hdrs)
  File "C:\Python33\lib\urllib\request.py", line 513, in error
    return self._call_chain(*args)
  File "C:\Python33\lib\urllib\request.py", line 447, in _call_chain
    result = func(*args)
  File "C:\Python33\lib\urllib\request.py", line 595, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 404: Not Found

1 个答案:

答案 0 :(得分:4)

&#34; Mozilla浏览器/ Mozilla的中央&#34;在GitHub上重命名为&#34; mozilla / gecko-dev&#34;,没有重定向,因此404.该URL在tldextract的最新版本中修复,1.3.1

如果它尚未修复,您可以手动为TLDExtract可调用suffix_list_url kwarg提供PSL网址。请参阅docs