使用beautifulsoup解析特定<div>中没有id / class的表

时间:2015-12-20 11:13:19

标签: python html parsing beautifulsoup

我正在尝试解析我用Beautiful Soup抓取的HTML页面的一部分数据。

我有两个问题。网站周围有许多表格,但它们没有唯一的标识符可以轻松导航。

我尝试过以下测试,但无济于事:

for tag in soup.find_all('div'):
    print tag.find('span')

这甚至都没有在我抓住的页面中找到div。

我做错了什么?

修改 更新了源文件。

我的代码返回TypeError的错误:期望的字符串或缓冲区是:

soup = BeautifulSoup(data, 'html.parser')
table = soup.findAll('a0:span', {"style":"font-family:Courier New,monospace; font-size:9pt;"})[0]

trs = table('span')
for tr in trs:
    print tr.th.text,
    print tr.td.text

这显然是数据是包含上述内容的文本文件。

编辑:第一页我试图刮擦。

<!-- Copyright (c) 2001 TrakHealth Pty Limited. ALL RIGHTS RESERVED. -->
<!-- This is a generic page used to display single simple components  -->

<HTML XMLNS=TRAK>
<HEAD>
<TITLE></TITLE>

 <SCRIPT SRC="/csp/broker/cspbroker.js"></SCRIPT><SCRIPT SRC="/csp/broker/cspxmlhttp.js"></SCRIPT>
<SCRIPT SRC="../scripts/websys.js"></SCRIPT>
<SCRIPT SRC="../custom/NHLS-LABTRAK/scripts/websys.js"></SCRIPT>
<LINK REL="stylesheet" TYPE="text/css" HREF="../styles/modern/websys.css">
<LINK REL="stylesheet" TYPE="text/css" HREF="../custom/NHLS-LABTRAK/scripts/websys.css">

<SCRIPT language='javascript'>
var TRELOADPAGE='websys.csp';
var TRELOADID='sRi1272LJScu4';
var tkKeepOpen=0;
function treload(csppage) {
 tkKeepOpen=1;
 window.location.href= "websys.csp?TRELOADID=sRi1272LJScu4&TRELOAD=1";
}
var TRELOADPATLIST='';
</SCRIPT>

<script language=javascript>
var t=new Array();
var tsc=new Array();
var session=new Array();
t['XMISSING']='is a required field but has not been entered';
t['XINVALID']='does not have a valid entry';
t['XDATE']='is a date field but does not have a valid date entered';
t['XTIME']='is a time field but does not have a valid time entered';
t['XNUMBER']='is a numeric field but does not have a valid number entered';
t['XLAYOUTERR']='The TrakCare Layout Editor is not functioning or has not been installed.\\n\\n 1. Please check that your browser security settings allow you to initialize and script activeX controls.\\n 2. Please check that the TRAK Layout Editor has been installed.';
t['XLOCKED']='Record is locked by another user.';
t['XLOCKEDCT']='Code table updates are currently disabled.';
t['XNOTCT']='You must connect to the code table server to update code tables.';
t['XLOCKEDMT']='Record is locked by MEDTRAK.';
t['XDAYS']='Sun,Mon,Tue,Wed,Thu,Fri,Sat';
t['XMONTHS']='January,February,March,April,May,June,July,August,September,October,November,December';
t['XUNSAVED']='There are unsaved changes on this page.';
t['XSORTMAXROWS']='Rows retrieved for sorting exceeds maximum specified in system configuration. Sort terminated.';
t['XDATERANGE']='Date From must be before Date To';
t['XMAXCHARS']='Maximum number of characters exceeded. Text will be truncated to the mamimum allowable length.';
t['XNOTAVAIL']='This functionality is not available.';
t['XUNIQUE']='Code or description is not unique.';
t['XLOADING']='Loading...';
session['LOGON.USERID']='IN0546623';
session['LOGON.USERCODE']='IN0546623';
session['LOGON.USERNAME']='Dr Kate de Villiers';
session['LOGON.GROUPID']='17';
session['LOGON.GROUPDESC']='L0A0';
session['LOGON.CTLOCID']='';
session['XMONTHSSHORT']='Jan,Feb,Mar,Apr,May,Jun,Jul,Aug,Sep,Oct,Nov,Dec';
session['CONTEXT']='W5';
window.status=session['LOGON.USERCODE'];
session['LOGON.SITECODE']='NHLS-LABTRAK';
session['LOGON.REGION']='';
session['LOGON.LANGID']='2';
session['REMOTE_ADDR']='41.13.90.246';
session['SESSIONID']='sRi12LJScu';

function tkMakeServerCall(tkclass,tkmethod) {
 if ((tkclass=='')||(tkmethod=='')) return '';
 var args=new Array('6$Q4nJfQIibx6KRykS2G7fwIO_0CynE1PGbXHXH99mj48F9stJAfA0mxb2lAcJz4',tkclass,tkmethod);
 for (var i=2; i<tkMakeServerCall.arguments.length; i++) {
  args[i+1]=tkMakeServerCall.arguments[i];
 }
 var retval=cspHttpServerMethod.apply(this,args);
 return retval;
}
var tkTUIDP="28"
var tkTUIDG="R$J3Um0CXoe6SMG0APDutHiKdaJienpZcmL2DLLWSok-"
var tkTUIDS="jJnWSu4mVW_TjQfKn6qV91tHsBeG9DAKkdbZxw3gyrk-"
var tkOverlayMethod="b9XqGIkzXAsNiiMNrAW4qI2Pw8JI3a9uQfBtHGd5VbOkMojLvsvmef1ULxa3sbuS"
var tkLongTextMethod="LU9BZtom$6x_UyWrWuudlFjUitSkkO9trUE_sYUqAlCekHzkMYqX0A9AQuZIEkaJ"
</script>

</HEAD>
<BODY><DIV id="PageContent">
<INPUT TYPE="Button" value="<<" onClick="history.back()"><INPUT TYPE="Button" value=">>" onClick="history.forward()">
<DIV id='cmp_DEBDebtor_Banner'><!-- COMPONENT 
Routine                  GCOM3.1
Page Name                websys.default.csp
Component ID             69
Component Name           DEBDebtor.Banner
websys.Component Version .51.1531 at 2015-07-14 03:14:18PM
Component Version        L2010.1.1 on 2015-07-14 03:14:52PM
Layout for               SYS.SYS 
-->
<DIV STYLE="LEFT: 0px; TOP: 0px" id='dDEBDebtor_Banner' onclick="websys_sckeys[String.fromCharCode(113)]='websys_help(\'69\',\'-100000000000000\',\'\');'"><FORM ACTION='websys.csp' method=post name='fDEBDebtor_Banner' id='fDEBDebtor_Banner' autocomplete='off'>
<INPUT TYPE='HIDDEN' ID='TFORM' NAME='TFORM' VALUE='DEBDebtor.Banner'>
<INPUT TYPE='HIDDEN' ID='TPAGID' NAME='TPAGID' VALUE='sRi5912LJScu3'>
<INPUT TYPE='HIDDEN' ID='TEVENT' NAME='TEVENT' VALUE=''>
<INPUT TYPE='HIDDEN' ID='TXREFID' NAME='TXREFID' VALUE='3'>
<INPUT TYPE='HIDDEN' ID='TOVERRIDE' NAME='TOVERRIDE' VALUE=''>
<INPUT TYPE='HIDDEN' ID='TDIRTY' NAME='TDIRTY' VALUE='1'>
<INPUT TYPE='HIDDEN' ID='TWKFL' NAME='TWKFL' VALUE=''>
<INPUT TYPE='HIDDEN' ID='TWKFLI' NAME='TWKFLI' VALUE=''>
<INPUT TYPE='HIDDEN' ID='TFRAME' NAME='TFRAME' VALUE=''>
<INPUT TYPE='HIDDEN' ID='TWKFLL' NAME='TWKFLL' VALUE="">
<INPUT TYPE='HIDDEN' ID='TWKFLJ' NAME='TWKFLJ' VALUE="">
<INPUT TYPE='HIDDEN' ID='TREPORT' NAME='TREPORT' VALUE="">
<INPUT TYPE='HIDDEN' ID='TRELOADCMP' NAME='TRELOADCMP' VALUE="">
<INPUT TYPE='HIDDEN' ID='TRELOADID' NAME='TRELOADID' VALUE="sRi1272LJScu4">
<INPUT TYPE='HIDDEN' ID='TOVERLAY' NAME='TOVERLAY' VALUE=''>
<input id="UseSameWin" name="UseSameWin" type="hidden" value="">
<TABLE><TBODY><TR><TD></TD><TD>
<table border='1px'><tr><td><table border='1px' cellspacing='0px' width='940px'><tr><td width='105px' ><span style='font-family:Arial; font-size:10pt; color:#000000;'> Episode No.</span></td><td width='139px' style='background-color:#0000FF;'><span style='font-weight:bold; color:#FFFF00;'>  PK01150438</span></td><td width='48px' ><span style='color:#000000;'> MRN</span></td><td width='192px' style='background-color:#0A246A;'><span style='font-weight:bold; color:#00FFFF;'>  MRN46913203</span></td><td width='39px' ><span style='color:#000000;'> Lab</span></td><td width='398px' style='background-color:#0A246A;'><span style='font-weight:bold; color:#D4D0C8;'>  Nelspruit Laboratory</span></td><td width='6px'></td></tr></table><table border='1px' cellspacing='0px' width='940px'><tr><td width='495px' ><span style='font-size:14pt; font-weight:bold; color:#0000FF;'> Unknown MALE</span></td><td width='37px' style='background-color:#0A246A;'><span style='font-size:10pt; font-weight:bold; color:#D4D0C8;'>  M</span></td><td width='88px' style='background-color:#0A246A;'><span style='font-size:9pt; font-weight:bold; color:#D4D0C8;'>  27 y</span></td><td width='130px' style='background-color:#0A246A;'><span style='font-size:10pt; font-weight:bold; color:#D4D0C8;'>  01/01/1988</span></td><td width='176px' style='background-color:#0A246A;'><span style='font-weight:bold; color:#D4D0C8;'>  </span></td><td width='6px'></td></tr></table><table border='1px' cellspacing='0px' width='940px'><tr><td width='418px' ><span style='font-size:8pt; color:#000000;'>  Clotted blood;EDTA blood</span></td><td width='339px' ><span style='font-size:8pt; color:#000000;'>  1</span></td><td width='178px' style='background-color:#0A246A;'><span style='font-size:10pt; font-weight:bold; color:#D4D0C8;'>  Routine</span></td><td width='3px'></td></tr></table><table border='1px' cellspacing='0px' width='940px'><tr><td width='40px' ><span style='font-size:10pt; color:#000000;'> F.N.</span></td><td width='154px' style='background-color:#0A246A;'><span style='font-weight:bold; color:#D4D0C8;'>  08/120456</span></td><td width='58px' ><span style='color:#000000;'> Ref No</span></td><td width='402px' style='background-color:#0A246A;'><span style='font-weight:bold; color:#D4D0C8;'>  AAWC0287NOF</span></td><td width='91px' ><span style='font-size:10pt; color:#000000;'> Collection</span></td><td width='115px' style='background-color:#0A246A;'><span style='font-weight:bold; color:#D4D0C8;'>  02/07/2015</span></td><td width='54px' style='background-color:#0A246A;'><span style='font-weight:bold; color:#D4D0C8;'>  00:30</span></td><td width='5px'></td></tr></table><table border='1px' cellspacing='0px' width='940px'><tr><td width='40px' ><span style='font-size:10pt; color:#000000;'> Hosp</span></td><td width='349px' style='background-color:#0A246A;'><span style='font-weight:bold; color:#D4D0C8;'>  Rob Ferreira Hospital</span></td><td width='26px' ><span style='color:#000000;'> </span></td><td width='240px' style='background-color:#0A246A;'><span style='font-weight:bold; color:#D4D0C8;'>  013 741 3031</span></td><td width='91px' ><span style='color:#000000;'> Received</span></td><td width='115px' style='background-color:#0A246A;'><span style='font-weight:bold; color:#D4D0C8;'>  02/07/2015</span></td><td width='54px' style='background-color:#0A246A;'><span style='font-weight:bold; color:#D4D0C8;'>  02:45</span></td><td width='6px'></td></tr></table><table border='1px' cellspacing='0px' width='940px'><tr><td width='40px' ><span style='font-size:10pt; color:#000000;'>  </span></td><td width='349px' style='background-color:#0A246A;'><span style='font-size:10pt; font-weight:bold; color:#D4D0C8;'>  Ward 4</span></td><td width='26px' ><span style='color:#000000;'> </span></td><td width='158px' style='background-color:#0A246A;'><span style='font-weight:bold; color:#D4D0C8;'>  </span></td><td width='362px'></td></tr></table><table border='1px' cellspacing='0px' width='940px'><tr><td width='40px' ><span style='font-size:10pt; color:#000000;'> Doc</span></td><td width='349px' style='background-color:#0A246A;'><span style='font-weight:bold; color:#D4D0C8;'>  DR IN CHARGE </span></td><td width='23px' ><span style='color:#000000;'> </span></td><td width='525px'></td></tr></table></td></tr></table></TD></TR></TBODY></TABLE>
</FORM>
<SCRIPT SRC="../scripts_gen/debdebtor.banner.js"></SCRIPT>
<SCRIPT SRC="../scripts/debdebtor.banner.js"></SCRIPT>
<SCRIPT SRC="../custom/NHLS-LABTRAK/scripts/debdebtor.banner.js"></SCRIPT>
<SCRIPT>
t['DemographicPanel']='Demographic Panel';
websys_sckeys[String.fromCharCode(113)]='websys_help(\'69\',\'-100000000000000\',\'\');';
websys_sckeys[String.fromCharCode(220)]='if (top.frames[\'eprmenu\']) top.frames[\'eprmenu\'].ToggleMenu(null);';
</SCRIPT>
</DIV>
<!-- COMPONENT END DEBDebtor.Banner -->
<SCRIPT language=javascript>
  try { InitMe(); } catch(e) {};
</SCRIPT>

</DIV>


<DIV id='cmp_web_EPVisitTestSet_FullLabPreview'><!-- COMPONENT 
Routine                  GCOM66.1
Page Name                websys.default.csp
Component ID             79
Component Name           web.EPVisitTestSet.FullLabPreview
websys.Component Version .51.1531 at 2015-07-14 03:14:18PM
Component Version        L2010.1.1 on 2015-07-14 03:14:52PM
Layout for               SYS.SYS 
-->
<DIV STYLE="LEFT: 0px; TOP: 0px" id='dweb_EPVisitTestSet_FullLabPreview' onclick="websys_sckeys[String.fromCharCode(113)]='websys_help(\'79\',\'-100000000000000\',\'\');'"><FORM ACTION='websys.csp' method=post name='fweb_EPVisitTestSet_FullLabPreview' id='fweb_EPVisitTestSet_FullLabPreview' autocomplete='off'>
<INPUT TYPE='HIDDEN' ID='TFORM' NAME='TFORM' VALUE='web.EPVisitTestSet.FullLabPreview'>
<INPUT TYPE='HIDDEN' ID='TPAGID' NAME='TPAGID' VALUE='sRi12LJS61cu8'>
<INPUT TYPE='HIDDEN' ID='TEVENT' NAME='TEVENT' VALUE=''>
<INPUT TYPE='HIDDEN' ID='TXREFID' NAME='TXREFID' VALUE='66'>
<INPUT TYPE='HIDDEN' ID='TOVERRIDE' NAME='TOVERRIDE' VALUE=''>
<INPUT TYPE='HIDDEN' ID='TDIRTY' NAME='TDIRTY' VALUE='1'>
<INPUT TYPE='HIDDEN' ID='TWKFL' NAME='TWKFL' VALUE=''>
<INPUT TYPE='HIDDEN' ID='TWKFLI' NAME='TWKFLI' VALUE=''>
<INPUT TYPE='HIDDEN' ID='TFRAME' NAME='TFRAME' VALUE=''>
<INPUT TYPE='HIDDEN' ID='TWKFLL' NAME='TWKFLL' VALUE="">
<INPUT TYPE='HIDDEN' ID='TWKFLJ' NAME='TWKFLJ' VALUE="">
<INPUT TYPE='HIDDEN' ID='TREPORT' NAME='TREPORT' VALUE="">
<INPUT TYPE='HIDDEN' ID='TRELOADCMP' NAME='TRELOADCMP' VALUE="">
<INPUT TYPE='HIDDEN' ID='TRELOADID' NAME='TRELOADID' VALUE="sRi1272LJScu4">
<INPUT TYPE='HIDDEN' ID='TOVERLAY' NAME='TOVERLAY' VALUE=''>
<TABLE><TBODY><TR><TD></TD><TD colSpan=3>
<pre>
</pre>
<pre>
<span style='font-family:Courier New,monospace; font-size:9pt; '>       Sodium</span>
<span style='font-family:Courier New,monospace; font-size:9pt; '>            Specimen insufficient for test(s)</span>
</pre>
<pre>
</pre>
<pre>
<span style='font-family:Courier New,monospace; font-size:9pt; '>       Potassium</span>
<span style='font-family:Courier New,monospace; font-size:9pt; '>            Specimen insufficient for test(s)</span>
</pre>
<pre>
</pre>
<pre>
<span style='font-family:Courier New,monospace; font-size:9pt; '>       Chloride</span>
<span style='font-family:Courier New,monospace; font-size:9pt; '>            Specimen insufficient for test(s)</span>
</pre>
<pre>
</pre>
<pre>
<span style='font-family:Courier New,monospace; font-size:9pt; '>       Bicarbonate</span>
<span style='font-family:Courier New,monospace; font-size:9pt; '>            Specimen insufficient for test(s)</span>
</pre>
<pre>
</pre>
<pre>
<span style='font-family:Courier New,monospace; font-size:9pt; '>       Urea</span>
<span style='font-family:Courier New,monospace; font-size:9pt; '>            Specimen insufficient for test(s)</span>
</pre>
<pre>
</pre>
<pre>
<span style='color:ORANGE;font-family:Courier New,monospace; font-size:9pt; '>            Authorised by xx  on 04/07/2015  at 17:02</span>
<span style='font-family:Courier New,monospace; font-size:9pt; '>       Creatinine</span><span style='font-weight:bold; font-family:Courier New,monospace; font-size:9pt; '>                       </span><span style='font-weight:bold; color:RED;font-family:Courier New,monospace; font-size:9pt; '>       116 H</span><span style='font-family:Courier New,monospace; font-size:9pt; '>    umol/L                    64 - 104</span>
<span style='font-family:Courier New,monospace; font-size:9pt; '>       eGFR (MDRD formula)                     >60      mL/min/1.73 m2</span>
<span style='font-family:Courier New,monospace; font-size:9pt; '>            MDRD-derived estimation of GFR may significantly underestimate true GFR</span>
<span style='font-family:Courier New,monospace; font-size:9pt; '>            in patients with GFR > 60 mL/min/1.73m^2.  It may also be unreliable in</span>
<span style='font-family:Courier New,monospace; font-size:9pt; '>            the case of: age <18 years or >70 years; pregnancy; serious co-morbid</span>
<span style='font-family:Courier New,monospace; font-size:9pt; '>            conditions; acute renal failure; extremes of body habitus/unusual diet,</span>
<span style='font-family:Courier New,monospace; font-size:9pt; '>            gross oedema. The MDRD-eGFR used here does not employ an ethnic factor</span>
<span style='font-family:Courier New,monospace; font-size:9pt; '>            for race.</span>
<span style='font-family:Courier New,monospace; font-size:9pt; '>            </span>
</pre>
<pre>
</pre>
<pre>
<span style='color:ORANGE;font-family:Courier New,monospace; font-size:9pt; '>            Authorised by xx  on 04/07/2015  at 17:02</span>
<span style='font-family:Courier New,monospace; font-size:9pt; '>       Calcium                                2.44      mmol/L                  2.15 - 2.55</span>
</pre>
<pre>
</pre>
<pre>
<span style='color:ORANGE;font-family:Courier New,monospace; font-size:9pt; '>            Authorised by xx  on 04/07/2015  at 17:02</span>
<span style='font-family:Courier New,monospace; font-size:9pt; '>       Magnesium                              0.88      mmol/L                  0.63 - 1.05</span>
</pre>
<pre>
</pre>
<pre>
<span style='color:ORANGE;font-family:Courier New,monospace; font-size:9pt; '>            Authorised by xx  on 04/07/2015  at 17:02</span>
<span style='font-family:Courier New,monospace; font-size:9pt; '>       Inorganic phosphate</span><span style='font-weight:bold; font-family:Courier New,monospace; font-size:9pt; '>              </span><span style='font-weight:bold; color:RED;font-family:Courier New,monospace; font-size:9pt; '>      1.47 H</span><span style='font-family:Courier New,monospace; font-size:9pt; '>    mmol/L                  0.78 - 1.42</span>
</pre>
<pre>
</pre>
<pre>
<span style='color:ORANGE;font-family:Courier New,monospace; font-size:9pt; '>            Authorised by xx  on 04/07/2015  at 17:02</span>
<span style='font-family:Courier New,monospace; font-size:9pt; '>       Total protein                            77      g/L                       60 - 78</span>
</pre>
<pre>
</pre>
<pre>
<span style='color:ORANGE;font-family:Courier New,monospace; font-size:9pt; '>            Authorised by xx  on 04/07/2015  at 17:02</span>
<span style='font-family:Courier New,monospace; font-size:9pt; '>       Albumin                                  48      g/L                       35 - 52</span>
</pre>
<pre>
</pre>
<pre>
<span style='color:ORANGE;font-family:Courier New,monospace; font-size:9pt; '>            Authorised by xx  on 04/07/2015  at 17:02</span>
<span style='font-family:Courier New,monospace; font-size:9pt; '>       Total bilirubin                           8      umol/L                     5 - 21</span>
</pre>
<pre>
</pre>
<pre>
<span style='color:ORANGE;font-family:Courier New,monospace; font-size:9pt; '>            Authorised by xx  on 04/07/2015  at 17:02</span>
<span style='font-family:Courier New,monospace; font-size:9pt; '>       Conjugated bilirubin (DBil)               1      umol/L                     0 - 3</span>
</pre>
<pre>
</pre>
<pre>
<span style='font-family:Courier New,monospace; font-size:9pt; '>       Alanine transaminase (ALT)</span>
<span style='font-family:Courier New,monospace; font-size:9pt; '>            Specimen insufficient for test(s)</span>
</pre>
<pre>
</pre>
<pre>
<span style='color:ORANGE;font-family:Courier New,monospace; font-size:9pt; '>            Authorised by xx  on 04/07/2015  at 17:02</span>
<span style='font-family:Courier New,monospace; font-size:9pt; '>       Aspartate transaminase (AST)             26      U/L                       15 - 40</span>
</pre>
<pre>
</pre>
<pre>
<span style='color:ORANGE;font-family:Courier New,monospace; font-size:9pt; '>            Authorised by xx  on 04/07/2015  at 17:02</span>
<span style='font-family:Courier New,monospace; font-size:9pt; '>       Alkaline phosphatase (ALP)               61      U/L                       53 - 128</span>
</pre>
<pre>
</pre>
<pre>
<span style='color:ORANGE;font-family:Courier New,monospace; font-size:9pt; '>            Authorised by xx  on 04/07/2015  at 17:02</span>
<span style='font-family:Courier New,monospace; font-size:9pt; '>       Gamma-glutamyl transferase (GGT)         18      U/L               <68</span>
</pre>
<pre>
</pre>
<pre>
<span style='font-family:Courier New,monospace; font-size:9pt; '>       Thyroid stimulating hormone (TSH)</span>
<span style='font-family:Courier New,monospace; font-size:9pt; '>            Specimen insufficient for test(s)</span>
</pre>
<pre>
</pre>
<pre>
<span style='font-family:Courier New,monospace; font-size:9pt; '>       Thyroxine (free T4)</span>
<span style='font-family:Courier New,monospace; font-size:9pt; '>            Specimen insufficient for test(s)</span>
</pre>
<pre>
</pre>
<pre>
<span style='color:ORANGE;font-family:Courier New,monospace; font-size:9pt; '>            Authorised by xx  on 02/07/2015  at 08:23</span>
<span style='font-family:Courier New,monospace; font-size:9pt; '>       White Cell Count             9.75   x 109/L                              3.92 - 10.40</span>
<span style='font-family:Courier New,monospace; font-size:9pt; '>       Red Cell Count               5.29   x 1012/L                             4.19 - 5.85</span>
<span style='font-family:Courier New,monospace; font-size:9pt; '>       Haemoglobin                  15.9   g/dL                                 13.4 - 17.5</span>
<span style='font-family:Courier New,monospace; font-size:9pt; '>       Haematocrit                 0.474   L/L                                 0.390 - 0.510</span>
<span style='font-family:Courier New,monospace; font-size:9pt; '>       MCV                          89.6   fL                                   83.1 - 101.6</span>
<span style='font-family:Courier New,monospace; font-size:9pt; '>       MCH                          30.0   pg                                   27.8 - 34.8</span>
<span style='font-family:Courier New,monospace; font-size:9pt; '>       MCHC                         33.5   g/dL                                 33.0 - 35.0</span>
<span style='font-family:Courier New,monospace; font-size:9pt; '>       RDW                          13.6   %                                    12.1 - 16.3</span>
<span style='font-family:Courier New,monospace; font-size:9pt; '>       Platelet Count                217   x 109/L                               171 - 388</span>
<span style='font-family:Courier New,monospace; font-size:9pt; '>       MPV                          10.0   fL                                    7.1 - 11.0</span>
<span style='font-family:Courier New,monospace; font-size:9pt; '>       Neutrophils                 65.60   %              6.40   x 109/L        1.60 - 6.98                                                                32.00 - 76.00</span>
<span style='font-family:Courier New,monospace; font-size:9pt; '>       Lymphocytes                 25.90   %              2.53   x 109/L        1.40 - 4.20                                                                   18.00 - 56.00</span>
<span style='font-family:Courier New,monospace; font-size:9pt; '>       Monocytes                    5.60   %              0.55   x 109/L        0.30 - 0.80                                                                    4.00 - 12.00</span>
<span style='font-family:Courier New,monospace; font-size:9pt; '>       Eosinophils                  0.50   %              0.05   x 109/L        0.00 - 0.95                                                                    0.00 - 8.00</span>
<span style='font-family:Courier New,monospace; font-size:9pt; '>       Basophils                    0.20   %              0.02   x 109/L        0.00 - 0.10                                                                    0.00 - 2.00</span>
<span style='font-family:Courier New,monospace; font-size:9pt; '>       "Other" Cells                2.10   %              0.20   x 109/L</span>
</pre>
<INPUT TYPE="HIDDEN" NAME="UnreadTSList" VALUE="PK01150438||C012||1,PK01150438||C013||1,PK01150438||C014||1,PK01150438||C015||1,PK01150438||C017||1,PK01150438||C002||1,PK01150438||C051||1,PK01150438||C053||1,PK01150438||C054||1,PK01150438||C056||1,PK01150438||C057||1,PK01150438||C058||1,PK01150438||C059||1,PK01150438||C060||1,PK01150438||C061||1,PK01150438||C062||1,PK01150438||C063||1,PK01150438||C150||1,PK01150438||C151||1,PK01150438||H002||1"><INPUT TYPE="HIDDEN" NAME="UnviewedTSList" VALUE="PK01150438||C012||1,PK01150438||C013||1,PK01150438||C014||1,PK01150438||C015||1,PK01150438||C017||1,PK01150438||C002||1,PK01150438||C051||1,PK01150438||C053||1,PK01150438||C054||1,PK01150438||C056||1,PK01150438||C057||1,PK01150438||C058||1,PK01150438||C059||1,PK01150438||C060||1,PK01150438||C061||1,PK01150438||C062||1,PK01150438||C063||1,PK01150438||C150||1,PK01150438||C151||1,PK01150438||H002||1"></TD></TR><TR><TD></TD><TD>
<a href="#" id="MarkAllReadLink" name="MarkAllReadLink" tabIndex="1">Mark all unread as read</A>
</TD><TD></TD><TD>
<a href="#" id="MarkAllViewedLink" name="MarkAllViewedLink" tabIndex="1">Mark all unviewed as viewed</A>
</TD></TR><TR><TD></TD><TD>
<A id="PrintReport" name="PrintReport" href='#' onclick="websys_lu('websys.csp?TUID=63&TUID=28',false,'top=30,left=20,width=800,height=600');return false;" TVARS="d79iPrintReport^sRi12L63JScu6" TUID='63'  tabIndex=1>Print Report</A>
</TD><TD>&nbsp;&nbsp; &nbsp;&nbsp; </TD><TD></TD></TR></TBODY></TABLE>
</FORM>
<SCRIPT language=javascript id='websysMoveJS64'>
websys_move(eval(screen.availWidth-760)/2,eval(screen.availHeight-540)/2,760,540);
</SCRIPT>
<SCRIPT SRC="../scripts_gen/web.epvisittestset.fulllabpreview.js"></SCRIPT>
<SCRIPT SRC="../scripts/web.epvisittestset.fulllabpreview.js"></SCRIPT>
<SCRIPT SRC="../custom/NHLS-LABTRAK/scripts/web.epvisittestset.fulllabpreview.js"></SCRIPT>
<SCRIPT>
t['result']='result';
t['MarkAllReadLink']='Mark all unread as read';
t['MarkAllViewedLink']='Mark all unviewed as viewed';
t['PrintReport']='Print Report';
websys_sckeys[String.fromCharCode(113)]='websys_help(\'79\',\'-100000000000000\',\'\');';
websys_sckeys[String.fromCharCode(220)]='if (top.frames[\'eprmenu\']) top.frames[\'eprmenu\'].ToggleMenu(null);';
</SCRIPT>
</DIV>
<!-- COMPONENT END web.EPVisitTestSet.FullLabPreview -->
<SCRIPT language=javascript>
  try { InitMe(); } catch(e) {};
</SCRIPT>

</DIV>

</DIV></BODY>
</HTML>

2 个答案:

答案 0 :(得分:0)

我不确定您遇到问题的原因,但是您没有显示所有代码。您要做的一件事是用tag.find_all('span')替换标签。这是工作代码:

bs = """ (your HTML code from above)  """
soup = BeautifulSoup(bs)

for tag in soup.find_all('div'):
    for span in tag.find_all('span'):
        print span.text

产生以下内容:

   Sodium
        Specimen insufficient for test(s)
   Potassium
        Specimen insufficient for test(s)
   Chloride
        Specimen insufficient for test(s)
....

答案 1 :(得分:0)

修正了它。

我正在将一个html文件输入到BeautifulSoup中。这非常有效,其中div名称是我实际想要从文档中获取的信息。

## create new beautiful soup page with last page source, element containing results
html_source = driver.find_element_by_id("dweb_EPVisitTestSet_FullLabPreview").text
soup = BeautifulSoup(html_source,'html.parser')
driver.quit()
print soup

它产生了这个输出:

     Sodium
            Specimen insufficient for test(s)
       Potassium
            Specimen insufficient for test(s)
       Chloride
            Specimen insufficient for test(s)
       Bicarbonate
            Specimen insufficient for test(s)
       Urea
            Specimen insufficient for test(s)
            Authorised by xx  on 04/07/2015  at 17:02
       Creatinine                              116 H    umol/L                    64 - 104
       eGFR (MDRD formula)                     &gt;60      mL/min/1.73 m2
            MDRD-derived estimation of GFR may significantly underestimate true GFR
            in patients with GFR &gt; 60 mL/min/1.73m^2.  It may also be unreliable in
            the case of: age &lt;18 years or &gt;70 years; pregnancy; serious co-morbid
            conditions; acute renal failure; extremes of body habitus/unusual diet,
            gross oedema. The MDRD-eGFR used here does not employ an ethnic factor
            for race.

            Authorised by xx  on 04/07/2015  at 17:02
       Calcium                                2.44      mmol/L                  2.15 - 2.55
            Authorised by xx  on 04/07/2015  at 17:02
       Magnesium                              0.88      mmol/L                  0.63 - 1.05
            Authorised by xx  on 04/07/2015  at 17:02
       Inorganic phosphate                    1.47 H    mmol/L                  0.78 - 1.42
            Authorised by xx  on 04/07/2015  at 17:02
       Total protein                            77      g/L                       60 - 78
            Authorised by xx  on 04/07/2015  at 17:02
       Albumin                                  48      g/L                       35 - 52
            Authorised by xx  on 04/07/2015  at 17:02
       Total bilirubin                           8      umol/L                     5 - 21
            Authorised by xx  on 04/07/2015  at 17:02
       Conjugated bilirubin (DBil)               1      umol/L                     0 - 3
       Alanine transaminase (ALT)
            Specimen insufficient for test(s)
            Authorised by xx  on 04/07/2015  at 17:02
       Aspartate transaminase (AST)             26      U/L                       15 - 40
            Authorised by xx  on 04/07/2015  at 17:02
       Alkaline phosphatase (ALP)               61      U/L                       53 - 128
            Authorised by xx  on 04/07/2015  at 17:02
       Gamma-glutamyl transferase (GGT)         18      U/L               &lt;68
       Thyroid stimulating hormone (TSH)
            Specimen insufficient for test(s)
       Thyroxine (free T4)
            Specimen insufficient for test(s)
            Authorised by xx  on 02/07/2015  at 08:23
       White Cell Count             9.75   x 109/L                              3.92 - 10.40
       Red Cell Count               5.29   x 1012/L                             4.19 - 5.85
       Haemoglobin                  15.9   g/dL                                 13.4 - 17.5
       Haematocrit                 0.474   L/L                                 0.390 - 0.510
       MCV                          89.6   fL                                   83.1 - 101.6
       MCH                          30.0   pg                                   27.8 - 34.8
       MCHC                         33.5   g/dL                                 33.0 - 35.0
       RDW                          13.6   %                                    12.1 - 16.3
       Platelet Count                217   x 109/L                               171 - 388
       MPV                          10.0   fL                                    7.1 - 11.0
       Neutrophils                 65.60   %              6.40   x 109/L        1.60 - 6.98                                                                32.00 - 76.00
       Lymphocytes                 25.90   %              2.53   x 109/L        1.40 - 4.20                                                                   18.00 - 56.00
       Monocytes                    5.60   %              0.55   x 109/L        0.30 - 0.80                                                                    4.00 - 12.00
       Eosinophils                  0.50   %              0.05   x 109/L        0.00 - 0.95                                                                    0.00 - 8.00
       Basophils                    0.20   %              0.02   x 109/L        0.00 - 0.10                                                                    0.00 - 2.00
       "Other" Cells                2.10   %              0.20   x 109/L
Mark all unread as read Mark all unviewed as viewed
Print Report  

这正是我想要的。现在只是格式化所有这些文本并在字符串数组中获取....