因此,我正在尝试使用网络抓取工具在我的学校在线注册页面上查找课程的状态。我已经到了列出所有关于班级的信息的页面,我正在寻找有多少座位是开放的。我从Mechanize获取的HTML不包括该表,我认为这是因为该表放在那里非html调用。
https://gist.github.com/anonymous/13c459a04fcf80ecff63
这是我查看页面源时获得的内容。您可以在底部附近看到有一个等待列表并且它已满(第47行)。
这是我的机械化输出:
https://gist.github.com/anonymous/b6acde9061e0d11b43cc
它没有应该在最后一行之后启动的表。如果有人需要完整的HTML,我会发布另一个要点。
我听说Spidermonkey可以帮助解决这些问题或者美丽的问题吗?
编辑:这是来自页面源的完整html
!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/transitional.dtd">
<HTML lang="en">
<HEAD>
<META http-equiv="Content-Type" content="text/html; charset=UTF-8">
<META HTTP-EQUIV="Pragma" NAME="Cache-Control" CONTENT="no-cache">
<META HTTP-EQUIV="Cache-Control" NAME="Cache-Control" CONTENT="no-cache">
<LINK REL="stylesheet" HREF="https://my.sandiego.edu/sandiego/includes/course_search.css" TYPE="text/css">
<LINK REL="stylesheet" HREF="/css/web_defaultprint.css" TYPE="text/css" media="print">
<TITLE>USD Course Search</TITLE>
<META HTTP-EQUIV="Content-Script-Type" NAME="Default_Script_Language" CONTENT="text/javascript">
<SCRIPT LANGUAGE="JavaScript" TYPE="text/javascript">
<!-- Hide JavaScript from older browsers
var submitcount=0;
function checkSubmit() {
if (submitcount == 0)
{
submitcount++;
return true;
}
else
{
alert("Your changes have already been submitted.");
return false;
}
}
// End script hiding -->
</SCRIPT>
<SCRIPT LANGUAGE="JavaScript" TYPE="text/javascript">
<!-- Hide JavaScript from older browsers
// Function to open a window
function windowOpen(window_url) {
helpWin = window.open(window_url,'','toolbar=yes,status=no,scrollbars=yes,menubar=yes,resizable=yes,directories=no,location=no,width=350,height=400');
if (document.images) {
if (helpWin) helpWin.focus()
}
}
// End script hiding -->
</SCRIPT>
</HEAD>
<BODY>
<div id="headerImage">
<IMG SRC="/wtlgifs/H_logo.gif" CLASS="headerImg" TITLE="" NAME="USD Header" HSPACE=0 VSPACE=0 BORDER=0 HEIGHT=69 WIDTH=321>
</div>
<DIV class="headerwrapperdiv">
<DIV class="pageheaderdiv1">
<A HREF="#main_content" onMouseover="window.status='Go to Main Content'; return true" onMouseout="window.status=''; return true" OnFocus="window.status='Go to Main Content'; return true" onBlur="window.status=''; return true" class="skiplinks">Go to Main Content</A>
<H1>University of San Diego Faculty/Student Self-Service System</H1></DIV><DIV class="headerlinksdiv">
</DIV>
<TABLE CLASS="plaintable" SUMMARY="This table displays Menu Items and Banner Search textbox."
WIDTH="100%">
<TR>
<TD CLASS="pldefault">
<DIV class="headerlinksdiv2">
<FORM ACTION="/prod/twbksrch.P_ShowResults" METHOD="POST">
Search
<SPAN class="fieldlabeltextinvisible"><LABEL for=keyword_in_id><SPAN class="fieldlabeltext">Search</SPAN></LABEL></SPAN>
<INPUT TYPE="text" NAME="KEYWRD_IN" SIZE="20" MAXLENGTH="65" ID="keyword_in_id">
<INPUT TYPE="submit" VALUE="Go">
</FORM>
</div>
</TD>
<TD CLASS="pldefault"><p class="rightaligntext"></p>
<SPAN class="pageheaderlinks">
<A HREF="/prod/twbksite.P_DispSiteMap?menu_name_in=bmenu.P_MainMnu&depth_in=2&columns_in=3" accesskey="2" class="submenulinktext2">SITE MAP</A>
|
<A HREF="/wtlhelp/twbhhelp.htm" accesskey="H" onClick="popup = window.open('/wtlhelp/twbhhelp.htm', 'PopupPage','height=500,width=450,scrollbars=yes,resizable=yes'); return false" target="_blank" onMouseOver="window.status=''; return true" onMouseOut="window.status=''; return true"onFocus="window.status=''; return true" onBlur="window.status=''; return true" class="submenulinktext2">HELP</A>
|
<A HREF="twbkwbis.P_Logout" accesskey="3" class="submenulinktext2">EXIT</A>
</span>
</TD>
</TR>
</TABLE>
</DIV>
<DIV class="pagetitlediv">
<TABLE CLASS="plaintable" SUMMARY="This table displays title and static header displays."
WIDTH="100%">
<TR>
<TD CLASS="pldefault">
<H2>USD Course Search</H2>
</TD>
<TD CLASS="pldefault">
</TD>
<TD CLASS="pldefault"><p class="rightaligntext"></p>
<DIV class="staticheaders">
</div>
</TD>
</TR>
<TR>
<TD class="bg3" width="100%" colSpan=3><IMG SRC="/wtlgifs/web_transparent.gif" ALT="Transparent Image" CLASS="headerImg" TITLE="Transparent Image" NAME="web_transparent" HSPACE=0 VSPACE=0 BORDER=0 HEIGHT=3 WIDTH=10></TD>
</TR>
</TABLE>
<a name="main_content"></a>
</DIV>
<DIV class="pagebodydiv">
<!-- ** END OF twbkwbis.P_OpenDoc ** -->
<DIV class="infotextdiv"><TABLE CLASS="infotexttable" SUMMARY="This layout table contains information that may be helpful in understanding the content and functionality of this page. It could be a brief set of instructions, a description of error messages, or other special information."><TR><TD CLASS="indefault"><IMG SRC="/wtlgifs/twgginfo.gif" ALT="Information" CLASS="headerImg" TITLE="Information" NAME="Info" HSPACE=0 VSPACE=0 BORDER=0 HEIGHT=24 WIDTH=27></TD><TD CLASS="indefault"><SPAN class="infotext"> <font color=red>Please do not use the back button on your web browser to return to the course search page. Use the link provided in order to return to the course search page.</font> Please refine your search by searching by department or other field. If the search results are too large you will see an error when you try to add a class.</SPAN></TD></TR></TABLE><P></DIV>
<style>
#course-query h3{
margin: 10px 0 5px 0;
}
#course-query fieldset{
border: 1px dotted #eee;
margin: 10px 0 0 0;
}
#course-query legend{
font-size: 1.2em;
font-weight: bold;
}
#course-query legend span{
font-weight: normal;
}
#course-query .week{
font-size: 1.0em;
border-collapse: collapse;
}
#course-query .week td{
border: 1px solid #eee;
}
#course-query .week th{
font-size: 0.9em;
border: 1px solid #eee;
}
#course-query label.sub-label {
color: #666;
display: block;
font-size: 0.8em;
font-variant: small-caps;
}
#course-query .registration-status{
color: #666;
border-bottom:1px dotted #ccc;
font-size:0.9em;
}
#course-query .course-blocks .registration-status{
font-size:0.8em;
}
.course-query{
list-style-type: none;
margin: 0;
padding: 0;
}
.course-query label{
display: block;
float: left;
width: 9.5em;
}
.course-query label em{
display: block;
font-size: 0.9em;
font-style: italic;
font-weight: normal;
}
.course-query label.top{
float:none;
width:auto;
}
.course-query li{
clear: both;
margin-bottom: 5px;
}
.course-query li.buttons{
background: #f4f4f4;
padding: 10px;
}
.course-query li em{
font-size: 0.9em;
font-style: italic;
font-weight: normal;
}
#course-query-results{
list-style-type: none;
margin: 0;
padding: 0;
}
#course-query-results ul{
margin: 0;
padding: 0;
}
#course-query-results label{
float: left;
width: 10em;
}
#course-query-results li{
clear: both;
list-style: none;
}
#course-query-results li li{
margin-left: 10px;
}
#course-query-results .closed{
color:#AC0B0B;
font-weight:bold;
}
#course-query-results .commands{
margin: 5px 10px;
}
#course-query-results .course{
background-color: #fff;
background-position: top left;
background-repeat: repeat-x;
border:1px solid #ccc;
margin: 5px 10px;
padding: 5px;
}
#course-query-results h2{
color: #85360F;
font-family: Arial, Helvetica, Sans-Serif;
font-size: 1.3em;
margin: 0 0 5px 0;
}
#course-query-results .instructors{
float: right;
}
#course-query-results td{padding: 3px;}
#course-query-results th{padding: 3px;}
#course-query-results .alt{color: #666;}
#course-query-results tr.alt{background-color: #f4f4f4;color:black;}
#course-query-results .instructors label{display:none;}
#course-query-results .details label{display: none;}
#course-query-results .course .attributes{}
#course-query-results .course .credits{}
#course-query-results .course .meeting{border-top:1px solid #eee;}
#course-query-results .course .description{border-top:1px solid #ddd; padding:3px;}
#course-query-results .course .seats{}
#course-query-results .course strong{color:#000;}
#course-query-results .course table{margin:0px;padding:0px;}
#course-query-results .course .title{
color:#008000;
font-size:1.1em;
font-weight:bold;
padding-bottom: 5px;
text-decoration:underline;
}
#course-query-results #courses{padding:0px;}
#course-query-results .red_title{color:#AC0B0B;}
</style>
<div id="course-query">
<div style="font-size:0.9em;">
[
<a href="usd_course_query.p_create?pv_termcode=201502&pv_termcode2=&pv_source=">create a new query for 201502</a>
]
</div>
<TABLE CLASS="datadisplaytable" id="course-query-results" summary="This table displays students from a student query."><CAPTION class="captiontext">Query Results</CAPTION>
<TR>
<TH CLASS="ddheader" scope="col" >CRN:</TH>
<TH CLASS="ddheader" scope="col" >Course<br/>Number:</TH>
<TH CLASS="ddheader" scope="col" >Course<br/>Section:</TH>
<TH CLASS="ddheader" scope="col" >Title:</TH>
<TH CLASS="ddheader" scope="col" >Textbook<br/>Information:</TH>
<TH CLASS="ddheader" scope="col" >Course<br/>Requirements:</TH>
<TH CLASS="ddheader" scope="col" >Fee:</TH>
<TH CLASS="ddheader" scope="col" >Credits:</TH>
<TH CLASS="ddheader" scope="col" >Days:</TH>
<TH CLASS="ddheader" scope="col" >Start<br/>Date:</TH>
<TH CLASS="ddheader" scope="col" >End<br/>Date:</TH>
<TH CLASS="ddheader" scope="col" >Begin<br/>Time:</TH>
<TH CLASS="ddheader" scope="col" >End<br/>Time:</TH>
<TH CLASS="ddheader" scope="col" >Seats<br/>Remaining:</TH>
<TH CLASS="ddheader" scope="col" >Reserved<br/>Seats:</TH>
<TH CLASS="ddheader" scope="col" >Instructors:</TH>
<TH CLASS="ddheader" scope="col" >Location:</TH>
</TR>
<TR class="alt ">
<TD NOWRAP CLASS="dddefault">2839</TD>
<TD NOWRAP CLASS="dddefault">PHIL-332</TD>
<TD NOWRAP CLASS="dddefault">01</TD>
<TD NOWRAP CLASS="dddefault">
Business Ethics</TD>
<TD NOWRAP CLASS="dddefault"><a href="http://usd-pc.verbasoftware.com/express?remote=1&ref=5380&term=USD+SPRING+2015&dept=PHIL&course=332§ion=01&getbooks=display+books" target="_blank">Click Here</a></TD>
<TD NOWRAP CLASS="dddefault">P</TD>
<TD CLASS="dddefault"> </TD>
<TD NOWRAP CLASS="dddefault"align="center">3</TD>
<TD NOWRAP CLASS="dddefault">
TR</TD>
<TD NOWRAP CLASS="dddefault">
26-JAN-15</TD>
<TD NOWRAP CLASS="dddefault">
11-MAY-15</TD>
<TD NOWRAP CLASS="dddefault">
7:45 am</TD>
<TD NOWRAP CLASS="dddefault">
9:05 am</TD>
<TD NOWRAP CLASS="dddefault"><em style="font-color:red;">CLOSED(Waitlist Full)</em></TD>
<TD NOWRAP CLASS="dddefault">
</TD>
<TD NOWRAP CLASS="dddefault">
Matt Zwolinski
</TD>
<TD NOWRAP CLASS="dddefault">
MH205</TD>
</TR>
</TABLE>
Your query returned 1 records.
<div style="font-size:0.9em;">
[
<a href="usd_course_query.p_create?pv_termcode=201502&pv_termcode2=">create a new query for 201502</a>
]
</div>
</div>
<!-- ** START OF twbkwbis.P_CloseDoc ** -->
<TABLE CLASS="plaintable" SUMMARY="This is table displays line separator at end of the page."
WIDTH="100%" cellSpacing=0 cellPadding=0 border=0><TR><TD class="bgtabon" width="100%" colSpan=2><IMG SRC="/wtlgifs/web_transparent.gif" ALT="Transparent Image" CLASS="headerImg" TITLE="Transparent Image" NAME="web_transparent" HSPACE=0 VSPACE=0 BORDER=0 HEIGHT=3 WIDTH=10></TD></TR></TABLE>
<A HREF="#top" onMouseover="window.status='Skip to top of page'; return true" onMouseout="window.status=''; return true" OnFocus="window.status='Skip to top of page'; return true" onBlur="window.status=''; return true" class="skiplinks">Skip to top of page</A>
</DIV>
<DIV class="footerbeforediv">
</DIV>
<DIV class="footerafterdiv">
</DIV>
<DIV class="globalafterdiv">
</DIV>
<DIV class="globalfooterdiv">
</DIV>
<DIV class="pagefooterdiv">
<SPAN class="releasetext">Release: 7.3</SPAN>
</DIV>
<DIV class="poweredbydiv">
</DIV>
<DIV class="div1"></DIV>
<DIV class="div2"></DIV>
<DIV class="div3"></DIV>
<DIV class="div4"></DIV>
<DIV class="div5"></DIV>
<DIV class="div6"></DIV>
<div class="banner_copyright"> <br><h5>© 2014 Ellucian Company L.P. and its affiliates.<br></h5></div>
</BODY>
</HTML>
答案 0 :(得分:1)
这是一个AJAX问题。 Mechanize无法呈现javascript。我建议使用Selenium WebDriver + PhantomJS。
答案 1 :(得分:0)