我想打开一个名为statseeker的报告工具的内部网页的网页,该工具存储从打开URL时运行的报告生成的图像。到目前为止,我在这:
import urllib
import urllib2
urlauth = 'http://statseeker/cgi/nim-report-lastn-drilldown?mode=ping&tz=America/Chicago&last_n=24h&device=jc-2090-1'
realm = 'statseeker'
username = 'admin'
password = '*******'
auth_handler = urllib2.HTTPBasicAuthHandler()
auth_handler.add_password(realm, urlauth, username, password)
opener = urllib2.build_opener(auth_handler)
data = opener.open(urlauth).read()
print data
此代码成功访问网页并在页面末尾两侧生成图形为.PNG图像,输出如下:
>>> import urllib
>>> import urllib2
>>>
>>> urlauth = 'http://statseeker/cgi/nim-report-lastn-drilldown?mode=ping&tz=Ame
rica/Chicago&last_n=24h&device=jc-2090-1'
>>> realm = 'statseeker'
>>> username = 'admin'
>>> password = '******'
>>> auth_handler = urllib2.HTTPBasicAuthHandler()
>>> auth_handler.add_password(realm, urlauth, username, password)
>>> opener = urllib2.build_opener(auth_handler)
>>> data = opener.open(urlauth).read()
>>> print data
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/st
rict.dtd">
<html>
<head>
<title>Ping Statistics - Last 24 Hours</title>
<link rel="stylesheet" type="text/css" href="/css/base.css">
<!--[if lt IE 7]>
<meta http-equiv="refresh" content="0; url=/unsupported_browser.html">
<![endif]-->
<script type="text/javascript" src="/js/utils.js"></script>
<script type="text/javascript" src="/js/nim.js"></script>
</head>
<body>
<div align="center">
<p class="p0">
<div align="center">
<table class="r_twb">
<tbody>
<tr>
<td colspan="5">
<table class="r_hdr" width="99%"><tbody>
<tr>
<td rowspan="2" class="leftlogo">
<img class="ss_logo2" src="/img/ss_logo_inverse_small.png">
</td>
<td>
<div class="r_hdr_title">Ping Statistics - Last 24 Hours</div>
</td>
<td nowrap class="rightlogo">
<span class="top_right" id="tog_refresh" title="start/pause
refresh" onclick="toggle_refresh(this)">
Refresh: <span id="refresh_id" >60</span>
<script type="text/javascript">toggle_refresh()</script>
</span>
<span class="top_right" onclick="schedule();">
<img src="/img/mail.png"
id="schedimg"
title="Email this Report"
onmouseover="this.src='/img/mail_light.png'"
onmouseout="this.src='/img/mail.png'">
</span>
<span class="top_right" onclick="create_pdf();">
<img src="/img/file_pdf.png"
id="pdfimg"
title="Create PDF"
onmouseover="this.src='/img/file_pdf_bright.png'"
onmouseout="this.src='/img/file_pdf.png'">
</span>
</td>
</tr>
<tr>
<td colspan="2">
<table class="r_hdr"><tbody>
<tr>
<td style="text-align:center;" id="td_tzselect">
<select id="tz" name="tz" onchange="change_tz(this.options[this.selectedIndex].v
alue); return false;">
<option value="America/Chicago" selected>America/Chicago
</select>
</td>
<td nowrap style="text-align:right;" id="timedisplay">
Sun 13 Jul 2014, 19:09
</td>
</tr>
</tbody></table>
</td>
</tr>
</tbody></table>
</td>
</tr> <tr class="r_summary"><td class="r_d9" colspan="5"><select id="lastn" n
ame="lastn" onchange="change_lastn(this.options[this.selectedIndex].value); retu
rn false;"> <option value="12h">Last 12 Hours <option value="24h" selected>Last
24 Hours <option value="48h">Last 48 Hours <option value="7d">Last 7 Days <optio
n value="30d">Last 30 Days <option value="60d">Last 60 Days <option value="90d">
Last 90 Days <option value="lastmonth">Last Month <option value="thismonth">This
Month <option value="lastyear">Last Year <option value="thisyear">This Year</se
lect></td></tr>
<tr class="r_col_hdr">
<td valign="top"></td>
<td valign="top">Minimum</td>
<td valign="top">Maximum</td>
<td valign="top">Average</td>
<td valign="top">Total</td>
</tr>
<tr class="alt_row0">
<td class="r_d0l">Delay (ms)</td>
<td class="r_d0r">53</td>
<td class="r_d0r">106</td>
<td class="r_d0r">54</td>
<td class="r_d0r">-</td>
</tr>
<tr class="alt_row1">
<td class="r_d0l">Duplicate Responses</td>
<td class="r_d0r">0</td>
<td class="r_d0r">0</td>
<td class="r_d0r">0</td>
<td class="r_d0r">0</td>
</tr>
<tr class="alt_row0">
<td class="r_d0l">Lost 1</td>
<td class="r_d0r">0</td>
<td class="r_d0r">1</td>
<td class="r_d0r">0</td>
<td class="r_d0r">2</td>
</tr>
<tr class="alt_row1">
<td class="r_d0l">Lost 2</td>
<td class="r_d0r">0</td>
<td class="r_d0r">0</td>
<td class="r_d0r">0</td>
<td class="r_d0r">0</td>
</tr>
<tr class="alt_row0">
<td class="r_d0l">Lost 3</td>
<td class="r_d0r">0</td>
<td class="r_d0r">0</td>
<td class="r_d0r">0</td>
<td class="r_d0r">0</td>
</tr>
<tr class="alt_row1">
<td class="r_d0l">Lost 4</td>
<td class="r_d0r">0</td>
<td class="r_d0r">0</td>
<td class="r_d0r">0</td>
<td class="r_d0r">0</td>
</tr>
</tbody>
</table>
</div>
</p>
<p class="p10">
<img class="graph" src="/graphs/ping.jc-2090-1.rtt.24h.1405296595.png">
</p>
<p class="p10">
<img class="graph" src="/graphs/ping.jc-2090-1.lost.24h.1405296595.png">
</p>
</div>
</body>
</html>
我不确定如何保存这些图像。最后的文件名每秒都会改变。
我希望能够在页面上保存我选择其名称的任何文件,直到通配符部分。
这样的前提是:
/graphs/ping.jc-2090-1.rtt.24h.1%%%%%%%%.png
的所有图片。save
在这个新文件位置选择,我选择的名称。image class=graph
部分?任何想法?
关注:
/graphs/ping.jc-2090-1.rtt.24h。%%%%%%%%%。PNG /graphs/ping.jc-2090-1.lost.24h。%%%%%%%%%。PNG
一旦确定了所有我没有多少剩下的,然后制作漂亮的HTML网站来显示这些图像。
答案 0 :(得分:0)
使用lxml
或BeutifulSoup
来解析该页面,并轻松访问页面上的HTML标记。
我无法测试它,但它可能是这样的。
data
=您的HTML字符串
import lxml, lxml.html
html = lxml.html.fromstring( data )
imgs = html.cssselect('img.graph')
for x in imgs:
print 'http://statseeker%s' % ( x.attrib['src'] )
这应该将网址打印到文件。
现在,您可以使用您的代码逐个获取此文件,然后将新data
保存在本地文件中。
修改强>
# get all urls
imgs_urls = []
for x in imgs:
imgs_urls.append( 'http://statseeker%s' % (x.attrib['src']) )
# print all urls
for url in imgs_urls:
print url