我正在网上抓取一个使用Python硒的网站。这是网站上的代码:
,我希望文本看起来像网站上显示的那样,即易于换行,以一种有组织的格式“阅读”。
我尝试使用
driver.find_element_by_class_name('record-content.record-information.record-content_j').text
但是里面有\ n \ n个字符。我尝试过print(text)
,看起来更好。但是有没有一种方法可以将文本存储在数据框或其他内容中,从而以有组织的格式保存文本。
该网站如下所示:
当我尝试
rawData=driver.find_element_by_class_name('record-content.record-information.record-content_j').text
sanitizedData = rawData.replace('\n','')
print(BeautifulSoup(sanitizedData, 'html.parser').prettify())
输出看起来像这样:
答案 0 :(得分:0)
由于我们是通过f2
内容提取的,没有包含class="participantName0"
标签,因此我们可以使用BeautifulSoup来实际实现输出的真实化。同样,如果您想维护html,可以使用<tbody role="rowgroup"><tr data-uid="a3aa1580-63e9-4d91-a20e-cbec3b83989c" role="row" class="k-grid-edit-row"><td style="display:none" role="gridcell">0<input type="hidden" required="" name="CareConferenceParticipantList[0].ParticipantID" value="0"></td><td tabindex="-1" required="True" validationmessage="Enter something in this field" role="gridcell" id="CareConferenceParticipantList_active_cell" class="k-edit-cell" aria-describedby="cf01405d-7381-4265-a499-cadd57165446 CareConferenceParticipantList_active_cell" data-role="editable">
<input id="ParticipantName" name="ParticipantName" type="text" value="" data-bind="value:ParticipantName" maxlength="300" class="k-valid"><span class="field-validation-valid" data-valmsg-for="ParticipantName" data-valmsg-replace="true" style="display: none;"></span></td><td tabindex="-1" role="gridcell"><span> </span><input type="hidden" name="CareConferenceParticipantList[0].ParticipantRole" class="credential0" value="" maxlength="300"></td><td role="gridcell"><input type="checkbox" tabindex="-1" name="CareConferenceParticipantList[0].IsInPerson" style="margin-left:30px;" class="IsInPersonChkBx" value="false"></td><td role="gridcell"><a class="k-button k-button-icontext k-grid-deletethisrow" href="javascript:void(0)" tabindex="-1"><span class="glyphicon glyphicon-trash"></span></a></td></tr><tr class="k-alt" data-uid="04c9cafd-bad2-402d-ac63-334b8049f79e" role="row"><td style="display:none" role="gridcell">0<input type="hidden" required="" name="CareConferenceParticipantList[1].ParticipantID" value="0"></td><td tabindex="-1" required="True" validationmessage="Enter something in this field" role="gridcell"><span> </span><input type="hidden" name="CareConferenceParticipantList[1].ParticipantName" class="participantName1" value="" maxlength="300"></td><td tabindex="-1" role="gridcell"><span> </span><input type="hidden" name="CareConferenceParticipantList[1].ParticipantRole" class="credential1" value="" maxlength="300"></td><td role="gridcell"><input type="checkbox" tabindex="-1" name="CareConferenceParticipantList[1].IsInPerson" style="margin-left:30px;" class="IsInPersonChkBx" value="false"></td><td role="gridcell"><a class="k-button k-button-icontext k-grid-deletethisrow" href="javascript:void(0)" tabindex="-1"><span class="glyphicon glyphicon-trash"></span></a></td></tr><tr data-uid="4d8828a8-6cd5-45af-8db0-3dfbdd07b121" role="row" class=""><td style="display:none" role="gridcell">0<input type="hidden" required="" name="CareConferenceParticipantList[2].ParticipantID" value="0"></td><td tabindex="-1" required="True" validationmessage="Enter something in this field" role="gridcell" class=""><span> </span><input type="hidden" name="CareConferenceParticipantList[2].ParticipantName" class="participantName2" value="" maxlength="300"></td><td tabindex="-1" role="gridcell"><span> </span><input type="hidden" name="CareConferenceParticipantList[2].ParticipantRole" class="credential2" value="" maxlength="300"></td><td role="gridcell"><input type="checkbox" tabindex="-1" name="CareConferenceParticipantList[2].IsInPerson" style="margin-left:30px;" class="IsInPersonChkBx" value="false"></td><td role="gridcell"><a class="k-button k-button-icontext k-grid-deletethisrow" href="javascript:void(0)" tabindex="-1"><span class="glyphicon glyphicon-trash"></span></a></td></tr></tbody>
,并在需要时删除任何换行符。希望这会有所帮助:)
.text