从HTTPBasicAuthHandler经过身份验证的页面保存动态命名的映像

时间:2014-07-14 02:34:52

标签: python html image wildcard urllib

我想打开一个名为statseeker的报告工具的内部网页的网页,该工具存储从打开URL时运行的报告生成的图像。到目前为止,我在这:

import urllib
import urllib2


urlauth = 'http://statseeker/cgi/nim-report-lastn-drilldown?mode=ping&tz=America/Chicago&last_n=24h&device=jc-2090-1'
realm = 'statseeker'
username = 'admin'
password = '*******'
auth_handler = urllib2.HTTPBasicAuthHandler()
auth_handler.add_password(realm, urlauth, username, password)
opener = urllib2.build_opener(auth_handler)
data = opener.open(urlauth).read()
print data

此代码成功访问网页并在页面末尾两侧生成图形为.PNG图像,输出如下:

>>> import urllib
>>> import urllib2
>>>
>>> urlauth = 'http://statseeker/cgi/nim-report-lastn-drilldown?mode=ping&tz=Ame
rica/Chicago&last_n=24h&device=jc-2090-1'
>>> realm = 'statseeker'
>>> username = 'admin'
>>> password = '******'
>>> auth_handler = urllib2.HTTPBasicAuthHandler()
>>> auth_handler.add_password(realm, urlauth, username, password)
>>> opener = urllib2.build_opener(auth_handler)
>>> data = opener.open(urlauth).read()
>>> print data
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/st
rict.dtd">
<html>
<head>
<title>Ping Statistics - Last 24 Hours</title>
<link rel="stylesheet" type="text/css" href="/css/base.css">
<!--[if lt IE 7]>
<meta http-equiv="refresh" content="0; url=/unsupported_browser.html">
<![endif]-->
<script type="text/javascript" src="/js/utils.js"></script>
<script type="text/javascript" src="/js/nim.js"></script>
</head>
<body>
<div align="center">
<p class="p0">
<div align="center">
<table class="r_twb">
<tbody>
   <tr>
      <td colspan="5">
         <table class="r_hdr" width="99%"><tbody>
            <tr>
               <td rowspan="2" class="leftlogo">
                  <img class="ss_logo2" src="/img/ss_logo_inverse_small.png">
               </td>
               <td>
                  <div class="r_hdr_title">Ping Statistics - Last 24 Hours</div>

               </td>
               <td nowrap class="rightlogo">
                  <span   class="top_right" id="tog_refresh" title="start/pause
refresh" onclick="toggle_refresh(this)">
                     Refresh: <span id="refresh_id" >60</span>
                     <script type="text/javascript">toggle_refresh()</script>
                  </span>

                  <span class="top_right" onclick="schedule();">
                     <img src="/img/mail.png"
                          id="schedimg"
                          title="Email this Report"
                          onmouseover="this.src='/img/mail_light.png'"
                          onmouseout="this.src='/img/mail.png'">
                  </span>
                  <span class="top_right" onclick="create_pdf();">
                     <img src="/img/file_pdf.png"
                          id="pdfimg"
                          title="Create PDF"
                          onmouseover="this.src='/img/file_pdf_bright.png'"
                          onmouseout="this.src='/img/file_pdf.png'">
                  </span>
               </td>
            </tr>
            <tr>
               <td colspan="2">
                  <table class="r_hdr"><tbody>
                     <tr>
                        <td style="text-align:center;" id="td_tzselect">&nbsp;
<select id="tz" name="tz" onchange="change_tz(this.options[this.selectedIndex].v
alue); return false;">
                <option value="America/Chicago" selected>America/Chicago
</select>


                        </td>
                        <td nowrap style="text-align:right;" id="timedisplay">
                           Sun 13 Jul 2014, 19:09
                        </td>
                     </tr>
                   </tbody></table>
               </td>
            </tr>
         </tbody></table>
      </td>
   </tr> <tr class="r_summary"><td class="r_d9" colspan="5"><select id="lastn" n
ame="lastn" onchange="change_lastn(this.options[this.selectedIndex].value); retu
rn false;"> <option value="12h">Last 12 Hours <option value="24h" selected>Last
24 Hours <option value="48h">Last 48 Hours <option value="7d">Last 7 Days <optio
n value="30d">Last 30 Days <option value="60d">Last 60 Days <option value="90d">
Last 90 Days <option value="lastmonth">Last Month <option value="thismonth">This
 Month <option value="lastyear">Last Year <option value="thisyear">This Year</se
lect></td></tr>

 <tr class="r_col_hdr">
  <td valign="top"></td>
  <td valign="top">Minimum</td>
  <td valign="top">Maximum</td>
  <td valign="top">Average</td>
  <td valign="top">Total</td>
 </tr>

 <tr class="alt_row0">
  <td class="r_d0l">Delay (ms)</td>
  <td class="r_d0r">53</td>
  <td class="r_d0r">106</td>
  <td class="r_d0r">54</td>
  <td class="r_d0r">-</td>
 </tr>

 <tr class="alt_row1">
  <td class="r_d0l">Duplicate Responses</td>
  <td class="r_d0r">0</td>
  <td class="r_d0r">0</td>
  <td class="r_d0r">0</td>
  <td class="r_d0r">0</td>
 </tr>

 <tr class="alt_row0">
  <td class="r_d0l">Lost 1</td>
  <td class="r_d0r">0</td>
  <td class="r_d0r">1</td>
  <td class="r_d0r">0</td>
  <td class="r_d0r">2</td>
 </tr>

 <tr class="alt_row1">
  <td class="r_d0l">Lost 2</td>
  <td class="r_d0r">0</td>
  <td class="r_d0r">0</td>
  <td class="r_d0r">0</td>
  <td class="r_d0r">0</td>
 </tr>

 <tr class="alt_row0">
  <td class="r_d0l">Lost 3</td>
  <td class="r_d0r">0</td>
  <td class="r_d0r">0</td>
  <td class="r_d0r">0</td>
  <td class="r_d0r">0</td>
 </tr>

 <tr class="alt_row1">
  <td class="r_d0l">Lost 4</td>
  <td class="r_d0r">0</td>
  <td class="r_d0r">0</td>
  <td class="r_d0r">0</td>
  <td class="r_d0r">0</td>
 </tr>

</tbody>
</table>
</div>
</p>
<p class="p10">
<img class="graph" src="/graphs/ping.jc-2090-1.rtt.24h.1405296595.png">
</p>
<p class="p10">
<img class="graph" src="/graphs/ping.jc-2090-1.lost.24h.1405296595.png">
</p>
</div>
</body>
</html>

我不确定如何保存这些图像。最后的文件名每秒都会改变。

我希望能够在页面上保存我选择其名称的任何文件,直到通配符部分。

这样的前提是:

  1. 在该网页中保存名称为/graphs/ping.jc-2090-1.rtt.24h.1%%%%%%%%.png的所有图片。
  2. save在这个新文件位置选择,我选择的名称。
  3. 也许将所有内容保存在image class=graph部分?
  4. 任何想法?

    关注:

    1. 在网址的最后一部分之前,图片的名称始终与名称相同。这是由报告软件生成的,无法更改。所以每个文件看起来都是一样的:
    2. /graphs/ping.jc-2090-1.rtt.24h。%%%%%%%%%。PNG /graphs/ping.jc-2090-1.lost.24h。%%%%%%%%%。PNG

      一旦确定了所有我没有多少剩下的,然后制作漂亮的HTML网站来显示这些图像。

1 个答案:

答案 0 :(得分:0)

使用lxmlBeutifulSoup来解析该页面,并轻松访问页面上的HTML标记。

我无法测试它,但它可能是这样的。

data =您的HTML字符串

import lxml, lxml.html

html = lxml.html.fromstring( data )

imgs = html.cssselect('img.graph')

for x in imgs:
   print 'http://statseeker%s' % ( x.attrib['src'] )

这应该将网址打印到文件。

现在,您可以使用您的代码逐个获取此文件,然后将新data保存在本地文件中。


修改

# get all urls

imgs_urls = []

for x in imgs:
   imgs_urls.append( 'http://statseeker%s' % (x.attrib['src']) )

# print all urls

for url in imgs_urls:
    print url