我正在尝试抓取我的打印机嵌入式Web服务器以获取当前的打印计数并将其写入文件。我是新手,并试图打印整个HTML,看看我到目前为止是否正确设置了脚本,输出是基础的。这是我的代码:
from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup
myAddress = "http://10.0.0.199/#hId-UsageReportPage"
uClient = uReq(myAddress)
pageHTML = uClient.read()
uClient.close()
pageSoup = soup(pageHTML, "lxml")
print(pageSoup.prettify())
input()
这是我的输出:
<html>
<body>
<p>
‹ TQoÚ0~ĸú©}p¼ª/SI6hÕMíŠVªn&9ˆiˆ=û( iÿ}v%€ºõÉÎù»ï¾Üùsr2¸ï~¯ y ÃÇÏ·_úÀ¸O}!£ü¸ÝÝÂyüȪŒ„¸úÆ"`‘¹b¹\ÆË‹XÛ©}«@sò6[îê¤8§œõ¢$Ä‚2ïE ÉIB`âøk¡^RÖ×aE|´6È k¾RF¸"’»Ò:¤ôqtÍ?25ˬ2ä³6à™|‘M”ÄÄÊ9>iû|?žA
y/j•w6K™¨Y—žu·‹gŽ½%nÇÔ T¥ªž¡ðÅÓ£Âe½Ä™ódË”9Z—è
DÚc¯¢×9–v@Æ€ö]â:Q"š©EÉXçk?DUÿ¨
<ef:>
Ç®Æ!W¦õ³ÂÍò^²Ð\½¼+솢
.e5]ø¡s8¿ô:‡ ±¬*´o“6ÜÈ
ß(*÷*¼ÊÈJé\
×\ªºÎ‘HœÐA…?H´Ûk`›#kl3Ú±ªp£·›yV¢´G HN8‡xO:p~Üâ‰ÖôîËqrûíùŸ—…h|ã…óä‘šœú)ÀI
ËÉ™¯á?ãRg’”®b‹þ:dxwÑ`°³nÔrqRéí~Oc¥ùÌñ #¼_¦÷Õkyh*çmèŸ-‹¹¯ 2ËÐ9 oðFsŠ0N„
ܦ7ôtXÉq‰Ð"Ñþ@ÂØê¥ó}¤Bz».ŒÑ–\Ùpmº”ˆ–\±1¤h^׿0…ÔáÔ
</ef:>
</p>
</body>
</html>
我不确定为什么会这样。 我正在使用python 3.6.0和Beautiful Soup 4。 另外,我的打印机是HP Photosmart D110a,如果它有帮助的话。
(更新) 这是HTML:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<script type="text/javascript">
frameWorkObj = {};
frameWorkObj.pageMgrDataPathPrefix = "/webApps/Layout/";
</script>
<script src="/framework/framework.js" type="text/javascript"></script>
<link href="/webApps/Layout/layout.css" rel="stylesheet" type="text/css" />
<script src="/webApps/Layout/header.js" type="text/javascript"></script>
</head>
<body>
<iframe id="pgm-history-iframe" src="/framework/HistoryFrame.html" style="display: none;"></iframe>
<iframe src="/framework/cookie/cookie.html" style="display: none;"></iframe>
<div id="pgm-language-div"></div>
<div id="pgm-banner"></div>
<div id="pgm-top-pane"></div>
<div id="pgm-title-div"></div>
<div class="pgm-container">
<div id="pgm-left-pane"></div>
<div class="outerContentPane">
<div id="contentPane" class="contentPane"></div>
</div>
<div class="clear"></div>
</div> <!-- .pgm-container -->
<div id="pgm-footer"></div>
<div id="pgm-page-ts-div"></div>
<script type="text/javascript">
// frame buster
if(top != self)
top.location.replace(self.location.href);
</script>
<noscript>
<div id="pgm-no-js-text">
<p>JavaScript is required to access this website.</p>
<p>Please enable JavaScript or use a browser that supports JavaScript.</p>
</div>
</noscript>
</body>
</html>