我在项目中使用bs4。每当我创建一个soup
实例时,它就会输出具有许多编码置信度得分的混乱输出:
req = urllib2.Request(url, headers=hdr)
page = urllib2.urlopen(req, timeout=5)
soup = BeautifulSoup(page.read(), "lxml")
它工作正常,但具有冗余输出。我只想删除它,但是找不到诸如verbose
选项之类的任何信息。
2018-11-15 10:40:46,286 utf-8 confidence = 0.99
2018-11-15 10:40:46,286 SHIFT_JIS Japanese confidence = 0.01
2018-11-15 10:40:46,287 EUC-JP Japanese confidence = 0.01
2018-11-15 10:40:46,287 GB2312 Chinese confidence = 0.01
2018-11-15 10:40:46,287 EUC-KR Korean confidence = 0.01
2018-11-15 10:40:46,287 CP949 Korean confidence = 0.01
2018-11-15 10:40:46,287 Big5 Chinese confidence = 0.01
2018-11-15 10:40:46,288 EUC-TW Taiwan confidence = 0.01
2018-11-15 10:40:46,288 windows-1251 Russian confidence = 0.01
2018-11-15 10:40:46,288 KOI8-R Russian confidence = 0.01
2018-11-15 10:40:46,288 ISO-8859-5 Russian confidence = 0.0
2018-11-15 10:40:46,288 MacCyrillic Russian confidence = 0.0
2018-11-15 10:40:46,288 IBM866 Russian confidence = 0.0
2018-11-15 10:40:46,289 IBM855 Russian confidence = 0.01
2018-11-15 10:40:46,289 ISO-8859-7 Greek confidence = 0.0
2018-11-15 10:40:46,289 windows-1253 Greek confidence = 0.0
2018-11-15 10:40:46,289 ISO-8859-5 Bulgairan confidence = 0.0
2018-11-15 10:40:46,289 windows-1251 Bulgarian confidence = 0.01
2018-11-15 10:40:46,290 TIS-620 Thai confidence = 0.0
2018-11-15 10:40:46,290 ISO-8859-9 Turkish confidence = 0.54363730033
2018-11-15 10:40:46,290 windows-1255 Hebrew confidence = 0.0
2018-11-15 10:40:46,290 windows-1255 Hebrew confidence = 0.0
2018-11-15 10:40:46,290 windows-1255 Hebrew confidence = 0.0
2018-11-15 10:40:46,291 utf-8 confidence = 0.99
2018-11-15 10:40:46,291 SHIFT_JIS Japanese confidence = 0.01
2018-11-15 10:40:46,291 EUC-JP Japanese confidence = 0.01
2018-11-15 10:40:46,291 GB2312 Chinese confidence = 0.01
2018-11-15 10:40:46,291 EUC-KR Korean confidence = 0.01
2018-11-15 10:40:46,291 CP949 Korean confidence = 0.01
2018-11-15 10:40:46,292 Big5 Chinese confidence = 0.01
2018-11-15 10:40:46,292 EUC-TW Taiwan confidence = 0.01
请帮助。任何建议都将不胜感激!
答案 0 :(得分:1)
您可以像这样设置更高的日志级别:
import logging
logger = logging.getLogger('chardet')
logger.setLevel(logging.CRITICAL)
通常,如果要查找谁产生一些令人讨厌的日志,请执行以下操作:
通过运行代码来激发要发出的日志。在这种情况下
req = urllib2.Request(url, headers=hdr)
page = urllib2.urlopen(req, timeout=5)
soup = BeautifulSoup(page.read(), "lxml")
然后记录器必须在此列表中
import logging
print(logging.Logger.manager.loggerDict.values())
[..., 'chardet', ...]
尝试一一关闭记录仪。一旦您不再看到日志,便知道发出该日志的是哪个日志:
import logging
for name in logging.Logger.manager.loggerDict.values():
print(name)
logger = logging.getLogger(name)
logger.setLevel(logging.CRITICAL)
# I have left the exact code here for demonstration purposes
req = urllib2.Request(url, headers=hdr)
page = urllib2.urlopen(req, timeout=5)
soup = BeautifulSoup(page.read(), "lxml")
然后在运行发出日志的代码之前设置日志级别:
import logging
logger = logging.getLogger('chardet')
logger.setLevel(logging.CRITICAL)
# No log output any more from here on
req = urllib2.Request(url, headers=hdr)
page = urllib2.urlopen(req, timeout=5)
soup = BeautifulSoup(page.read(), "lxml")