从tr标签内部返回元素列表

时间:2018-10-21 19:02:16

标签: regex python-3.x list dictionary beautifulsoup

我有一个BeautifulSoup返回的html源代码,使用下面的代码。我已经发布了“ tr”块的样本。我想创建一个字典列表,例如下面的示例“ outputList”,其“ tr”块的“ id”类似于“ ctl00_MainContent_subGBS_DataDetails_ctl01_trGBKItem”。“ _ ctl01”部分更改,其余的id保持不变。 >

我可以使用以下方法将所有具有类似“ id”的“ tr”块放入列表中:

tstsoup.find_all('tr',{'id':re.compile('ctl(\d\d)_MainContent_subGBS_DataDetails_ctl(\d\d)_trGBKItem')})

但是我还没有弄清楚如何挑选出“名称”和“标记”部分。

我刚接触硒和BeautifulSoup。我的最终目标是创建一个脚本,我可以运行该脚本从父母门户中获取孩子的高中成绩。

我一直在此仓库中查看代码:

https://github.com/AlbanyCompSci/aeries-api

代码:

tstsoup = BeautifulSoup(driver.page_source)

所需的输出:

outputList=[{‘name’:’Math 3 Period 1’,’Mark’:’85.10’},{‘name’:’French II’,’Mark’:’93.01’}

tstsoup数据示例:

<tr id="ctl00_MainContent_subGBS_DataDetails_ctl01_trGBKItem">
<td class="DataLE"></td>
<td class="Data ac">
<input class="k-button" id="ctl00_MainContent_subGBS_DataDetails_ctl01_btnGradeDetails" name="ctl00$MainContent$subGBS$DataDetails$ctl01$btnGradeDetails" type="submit" value="Details"/>
</td>
<td class="Data al"><a class="link-gradebook-details" href="javascript:__doPostBack('ctl00$MainContent$subGBS$DataDetails$ctl01$lbtnCourseTitle','')" id="ctl00_MainContent_subGBS_DataDetails_ctl01_lbtnCourseTitle">Math 3 Period 1</a></td>
<td class="Data ac">Fall</td>
<td class="Data ac">1</td>
<td class="Data al">Missureli, A</td>
<td class="Data ac"><span style="display:block;" title="85.10">85.10</span></td>
<td class="Data ac"><span style="display:none;" title="85.10">85.10</span></td>
<td class="Data al"><span style="margin-left:48%">B</span></td>
<td class="Data ac" style="padding-top:3px"><img alt="DOWN" class="Clickable gradebook-trend-down" id="ctl00_MainContent_subGBS_DataDetails_ctl01_imgTrend" onclick="createScatterChart_8838138_F();" src="images/blank.gif" title="Forecasted value of 81.99% compared to the average of the last four overall scores 86.05%   Click for Details"/><br/><a class="gradebook-trend-click-hint" href="SubForms/#" id="ctl00_MainContent_subGBS_DataDetails_ctl01_gradebookTrendDetail" onclick="createScatterChart_8838138_F(); return false;">Details</a></td>
<td class="Data ac"><span id="ctl00_MainContent_subGBS_DataDetails_ctl01_lblNumMissing" style="color:Red;">3</span></td>
<td class="FixedData ac"><table border="0" class="ac" style="box-sizing: content-box; width: 100px;"><tbody><tr><td class="ac" style="width: 20%; border: none;" title="Monday - 10/15/2018">-</td><td class="ac" style="width: 20%; border: none;" title="Tuesday - 10/16/2018">-</td><td class="ac" style="width: 20%; border: none;" title="Wednesday - 10/17/2018">-</td><td class="ac" style="width: 20%; border: none;" title="Thursday - 10/18/2018">-</td><td class="ac" style="width: 20%; border: none;" title="Friday - 10/19/2018">-</td></tr></tbody></table></td>
<td class="Data ac"><span title="10/19/2018">Oct 19</span></td>
<td class="Data al"></td>
<td class="DataLER"></td>
</tr>
<tr id="ctl00_MainContent_subGBS_DataDetails_ctl02_trGBKItem">
<td class="DataLE"></td>
<td class="Data ac">
<input class="k-button" id="ctl00_MainContent_subGBS_DataDetails_ctl02_btnGradeDetails" name="ctl00$MainContent$subGBS$DataDetails$ctl02$btnGradeDetails" type="submit" value="Details"/>
</td>
<td class="Data al"><a class="link-gradebook-details" href="javascript:__doPostBack('ctl00$MainContent$subGBS$DataDetails$ctl02$lbtnCourseTitle','')" id="ctl00_MainContent_subGBS_DataDetails_ctl02_lbtnCourseTitle">French II</a></td>
<td class="Data ac">Fall</td>
<td class="Data ac">2</td>
<td class="Data al">Rauw, C</td>
<td class="Data ac"><span style="display:block;" title="93.01">93.01</span></td>
<td class="Data ac"><span style="display:none;" title="93.01">93.01</span></td>
<td class="Data al"><span style="margin-left:48%">A-</span></td>
<td class="Data ac" style="padding-top:3px"><img alt="SAME" class="Clickable gradebook-trend-same" id="ctl00_MainContent_subGBS_DataDetails_ctl02_imgTrend" onclick="createScatterChart_7185099_F();" src="images/blank.gif" title="Forecasted value of 94.05% compared to the average of the last four overall scores 93.19%   Click for Details"/><br/><a class="gradebook-trend-click-hint" href="SubForms/#" id="ctl00_MainContent_subGBS_DataDetails_ctl02_gradebookTrendDetail" onclick="createScatterChart_7185099_F(); return false;">Details</a></td>
<td class="Data ac"><span id="ctl00_MainContent_subGBS_DataDetails_ctl02_lblNumMissing">0</span></td>
<td class="FixedData ac"><table border="0" class="ac" style="box-sizing: content-box; width: 100px;"><tbody><tr><td class="ac" style="width: 20%; border: none;" title="Monday - 10/15/2018">-</td><td class="ac" style="width: 20%; border: none;" title="Tuesday - 10/16/2018">-</td><td class="ac" style="width: 20%; border: none;" title="Wednesday - 10/17/2018">-</td><td class="ac" style="width: 20%; border: none;" title="Thursday - 10/18/2018">-</td><td class="ac" style="width: 20%; border: none;" title="Friday - 10/19/2018">-</td></tr></tbody></table></td>
<td class="Data ac"><span title="10/18/2018">Oct 18</span></td>
<td class="Data al"></td>
<td class="DataLER"></td>
</tr>

1 个答案:

答案 0 :(得分:0)

下面的代码似乎可以解决问题。

代码:

tst_tr=tstsoup.find_all('tr',{'id':re.compile('ctl(\d\d)_MainContent_subGBS_DataDetails_ctl(\d\d)_trGBKItem')})


def grbk(src_tr):

    std_grbk={'name':src_tr.find_all('td')[2].get_text(),
         'Mark':src_tr.find_all('td')[6].get_text()}

    return std_grbk



tst_stuff=[]

for i in range(len(tst_tr)):
    tst_stuff.append(grbk(tst_tr[i]))