机械汤 - 填写表格,使用棘手的HTML表格

时间:2018-02-21 02:54:19

标签: python forms mechanize mechanicalsoup

我正在尝试使用机械汤自动填写并提交时间表。

这就是表格的样子:

timesheet_form

这是该页面部分的相关源代码:

<FORM ACTION="bwpkteci.P_UpdateTimeInOut" METHOD="post">
<INPUT TYPE="hidden" NAME="JobsSeqNo" VALUE="208138">
<INPUT TYPE="hidden" NAME="LastDate" VALUE="0">
<INPUT TYPE="hidden" NAME="par_restart" VALUE="Y">
<INPUT TYPE="hidden" NAME="par_update" VALUE="Y">
<INPUT TYPE="hidden" NAME="par_submit" VALUE="Y">
<INPUT TYPE="hidden" NAME="par_recall" VALUE="N">
<INPUT TYPE="hidden" NAME="EarnCode" VALUE="RSA">
<INPUT TYPE="hidden" NAME="DateSelected" VALUE="20-FEB-2018">
<TABLE  CLASS="dataentrytable" SUMMARY="This user enters the time of day for the hours worked into this table in order for the system to calculate the hours.">
<TR>
<TD CLASS="delabel" scope="row" >Date:</TD>
<TD CLASS="dedefault">Tuesday, Feb 20,2018</TD>
</TR>
<TR>
<TD CLASS="delabel" scope="row" >Earnings Code:</TD>
<TD CLASS="dedefault">Regular Student Aide</TD>
</TR>
</TABLE>
<TABLE  CLASS="dataentrytable" SUMMARY="This is the detail table where the user enters the time of the day for the hours worked in order for the system to calculate the hours.">
<TR>
<TD CLASS="deheader" scope="col" ><LABEL for=shift_input_id><SPAN class="fieldlabeltext">Shift</SPAN></LABEL></TD>
<TD COLSPAN="2" CLASS="deheader" scope="col" ><LABEL for=timein_input_id><SPAN class="fieldlabeltext">Time In</SPAN></LABEL></TD>
<TD COLSPAN="2" CLASS="deheader" scope="col" ><LABEL for=timeout_input_id><SPAN class="fieldlabeltext">Time Out</SPAN></LABEL></TD>
<TD CLASS="deheader" scope="col" >Total Hours</TD>
</TR>
<INPUT TYPE="hidden" NAME="LineNumber" VALUE="1">
<TR>
<TD CLASS="dedefault"><INPUT TYPE="text" NAME="Shift" SIZE="2" MAXLENGTH="1" VALUE="1" ID="shift_input_id"></TD>
<TD CLASS="dedefault"><INPUT TYPE="text" NAME="TimeIn" SIZE="6" MAXLENGTH="5" ID="timein_input_id"></TD>
<TD CLASS="dedefault">
<SELECT NAME="TimeInAm" SIZE="1">
<OPTION VALUE="AM" SELECTED>AM
<OPTION VALUE="PM">PM
</SELECT>
</TD>
<TD CLASS="dedefault"><INPUT TYPE="text" NAME="TS_TimeOut" SIZE="6" MAXLENGTH="5" ID="timeout_input_id"></TD>
<TD CLASS="dedefault">
<SELECT NAME="TimeOutAm" SIZE="1">
<OPTION VALUE="AM" SELECTED>AM
<OPTION VALUE="PM">PM
</SELECT>
</TD>
<TD CLASS="dedefault"><p class="rightaligntext">0</p></TD>
</TR>
<TR>
<TD COLSPAN="5" CLASS="dedead">&nbsp;</TD>
<TD CLASS="dedefault"><p class="rightaligntext">0</p></TD>
</TR>
</TABLE>
<P>
<TABLE  CLASS="plaintable" SUMMARY="This layout table is used to align buttons.">
<TR>
<TD CLASS="pldefault">
<INPUT TYPE="submit" NAME="ButtonSelected" VALUE="Time Sheet">
<INPUT TYPE="submit" NAME="ButtonSelected" VALUE="Previous Day">
<INPUT TYPE="submit" NAME="ButtonSelected" VALUE="Next Day">
</TD>
</TR>
<TR>
<TD CLASS="pldefault">
<INPUT TYPE="submit" NAME="ButtonSelected" VALUE="Add New Line">
<INPUT TYPE="submit" NAME="ButtonSelected" VALUE="Save">
<INPUT TYPE="submit" NAME="ButtonSelected" VALUE="Copy">
<INPUT TYPE="submit" NAME="ButtonSelected" VALUE="Delete">
</TD>
</TR>
</TABLE>
</FORM>

我正在使用机械汤填写表格并提交。

#navigate to the timesheet page
browser.open(timesheetURL)
browser.get_current_page()

#select the form
browser.select_form('form[action="bwpkteci.P_UpdateTimeInOut"]')
form = browser.get_current_form()

#add required controls and set values for testing
form.new_control('text', 'shift', '1')
form.new_control('text', 'TimeIn', '08:30')
form.new_control('select', 'TimeInAm', 1)    #1 stands for PM
form.new_control('text', 'TS_TimeOut', '09:30')
form.new_control('select', 'TimeOutAm', 1) #1 stands for PM
form.new_control('submit', 'ButtonSelected', 'Save')
form.choose_submit('ButtonSelected')
browser.launch_browser()
browser.submit_selected()
browser.launch_browser()

我不得不使用new_control函数添加新控件,因为当我通过启动浏览器进行检查时,TimeIn,TimeOut框没有显示出来。

此代码不起作用,我无法弄清楚原因。我想也许我弄乱了我添加的控件的名称或类型(可能它与实际的输入类型和表单名称不匹配 - 我用chrome扩展程序检查过,情况也不是这样)因为单击了保存按钮但测试值没有真正注册。

check

这是浏览器在单击保存按钮之前的样子:

before

after

它没有注册!

1 个答案:

答案 0 :(得分:1)

使用MechancialSoup版本0.10.0,我能够正确解析您的HTML片段(我能说的最好)。我没有看到任何特别会导致旧版本的MechanicalSoup失败的内容,但如果没有其他工作可能会尝试更新。

我使用了以下简单代码:

import mechanicalsoup
browser = mechanicalsoup.StatefulBrowser()
browser.open_fake_page(text)  #Here 'text' is the HTML snippet
form = browser.select_form('form[action="bwpkteci.P_UpdateTimeInOut"]')
form.print_summary()
browser.launch_browser()

print_summary()方法会输出所有表单元素,其中包含您缺少的TimeInTimeOut框:

<input name="JobsSeqNo" type="hidden" value="208138"/>
<input name="LastDate" type="hidden" value="0"/>
<input name="par_restart" type="hidden" value="Y"/>
<input name="par_update" type="hidden" value="Y"/>
<input name="par_submit" type="hidden" value="Y"/>
<input name="par_recall" type="hidden" value="N"/>
<input name="EarnCode" type="hidden" value="RSA"/>
<input name="DateSelected" type="hidden" value="20-FEB-2018"/>
<input name="LineNumber" type="hidden" value="1"/>
<input id="shift_input_id" maxlength="1" name="Shift" size="2" type="text" value="1"/>
<input id="timein_input_id" maxlength="5" name="TimeIn" size="6" type="text"/>
<select name="TimeInAm" size="1">
<option selected="" value="AM">AM</option><option value="PM">PM</option></select>
<input id="timeout_input_id" maxlength="5" name="TS_TimeOut" size="6" type="text"/>
<select name="TimeOutAm" size="1">
<option selected="" value="AM">AM</option><option value="PM">PM</option></select>
<input name="ButtonSelected" type="submit" value="Time Sheet"/>
<input name="ButtonSelected" type="submit" value="Previous Day"/>
<input name="ButtonSelected" type="submit" value="Next Day"/>
<input name="ButtonSelected" type="submit" value="Add New Line"/>
<input name="ButtonSelected" type="submit" value="Save"/>
<input name="ButtonSelected" type="submit" value="Copy"/>
<input name="ButtonSelected" type="submit" value="Delete"/>

当我使用launch_browser()

时,您所缺少的元素也会正确显示

imgur