我正在尝试解析我用Beautiful Soup抓取的HTML页面的一部分数据。
我有两个问题。网站周围有许多表格,但它们没有唯一的标识符可以轻松导航。
我尝试过以下测试,但无济于事:
for tag in soup.find_all('div'):
print tag.find('span')
这甚至都没有在我抓住的页面中找到div。
我做错了什么?
修改 更新了源文件。
我的代码返回TypeError的错误:期望的字符串或缓冲区是:
soup = BeautifulSoup(data, 'html.parser')
table = soup.findAll('a0:span', {"style":"font-family:Courier New,monospace; font-size:9pt;"})[0]
trs = table('span')
for tr in trs:
print tr.th.text,
print tr.td.text
这显然是数据是包含上述内容的文本文件。
编辑:第一页我试图刮擦。
<!-- Copyright (c) 2001 TrakHealth Pty Limited. ALL RIGHTS RESERVED. -->
<!-- This is a generic page used to display single simple components -->
<HTML XMLNS=TRAK>
<HEAD>
<TITLE></TITLE>
<SCRIPT SRC="/csp/broker/cspbroker.js"></SCRIPT><SCRIPT SRC="/csp/broker/cspxmlhttp.js"></SCRIPT>
<SCRIPT SRC="../scripts/websys.js"></SCRIPT>
<SCRIPT SRC="../custom/NHLS-LABTRAK/scripts/websys.js"></SCRIPT>
<LINK REL="stylesheet" TYPE="text/css" HREF="../styles/modern/websys.css">
<LINK REL="stylesheet" TYPE="text/css" HREF="../custom/NHLS-LABTRAK/scripts/websys.css">
<SCRIPT language='javascript'>
var TRELOADPAGE='websys.csp';
var TRELOADID='sRi1272LJScu4';
var tkKeepOpen=0;
function treload(csppage) {
tkKeepOpen=1;
window.location.href= "websys.csp?TRELOADID=sRi1272LJScu4&TRELOAD=1";
}
var TRELOADPATLIST='';
</SCRIPT>
<script language=javascript>
var t=new Array();
var tsc=new Array();
var session=new Array();
t['XMISSING']='is a required field but has not been entered';
t['XINVALID']='does not have a valid entry';
t['XDATE']='is a date field but does not have a valid date entered';
t['XTIME']='is a time field but does not have a valid time entered';
t['XNUMBER']='is a numeric field but does not have a valid number entered';
t['XLAYOUTERR']='The TrakCare Layout Editor is not functioning or has not been installed.\\n\\n 1. Please check that your browser security settings allow you to initialize and script activeX controls.\\n 2. Please check that the TRAK Layout Editor has been installed.';
t['XLOCKED']='Record is locked by another user.';
t['XLOCKEDCT']='Code table updates are currently disabled.';
t['XNOTCT']='You must connect to the code table server to update code tables.';
t['XLOCKEDMT']='Record is locked by MEDTRAK.';
t['XDAYS']='Sun,Mon,Tue,Wed,Thu,Fri,Sat';
t['XMONTHS']='January,February,March,April,May,June,July,August,September,October,November,December';
t['XUNSAVED']='There are unsaved changes on this page.';
t['XSORTMAXROWS']='Rows retrieved for sorting exceeds maximum specified in system configuration. Sort terminated.';
t['XDATERANGE']='Date From must be before Date To';
t['XMAXCHARS']='Maximum number of characters exceeded. Text will be truncated to the mamimum allowable length.';
t['XNOTAVAIL']='This functionality is not available.';
t['XUNIQUE']='Code or description is not unique.';
t['XLOADING']='Loading...';
session['LOGON.USERID']='IN0546623';
session['LOGON.USERCODE']='IN0546623';
session['LOGON.USERNAME']='Dr Kate de Villiers';
session['LOGON.GROUPID']='17';
session['LOGON.GROUPDESC']='L0A0';
session['LOGON.CTLOCID']='';
session['XMONTHSSHORT']='Jan,Feb,Mar,Apr,May,Jun,Jul,Aug,Sep,Oct,Nov,Dec';
session['CONTEXT']='W5';
window.status=session['LOGON.USERCODE'];
session['LOGON.SITECODE']='NHLS-LABTRAK';
session['LOGON.REGION']='';
session['LOGON.LANGID']='2';
session['REMOTE_ADDR']='41.13.90.246';
session['SESSIONID']='sRi12LJScu';
function tkMakeServerCall(tkclass,tkmethod) {
if ((tkclass=='')||(tkmethod=='')) return '';
var args=new Array('6$Q4nJfQIibx6KRykS2G7fwIO_0CynE1PGbXHXH99mj48F9stJAfA0mxb2lAcJz4',tkclass,tkmethod);
for (var i=2; i<tkMakeServerCall.arguments.length; i++) {
args[i+1]=tkMakeServerCall.arguments[i];
}
var retval=cspHttpServerMethod.apply(this,args);
return retval;
}
var tkTUIDP="28"
var tkTUIDG="R$J3Um0CXoe6SMG0APDutHiKdaJienpZcmL2DLLWSok-"
var tkTUIDS="jJnWSu4mVW_TjQfKn6qV91tHsBeG9DAKkdbZxw3gyrk-"
var tkOverlayMethod="b9XqGIkzXAsNiiMNrAW4qI2Pw8JI3a9uQfBtHGd5VbOkMojLvsvmef1ULxa3sbuS"
var tkLongTextMethod="LU9BZtom$6x_UyWrWuudlFjUitSkkO9trUE_sYUqAlCekHzkMYqX0A9AQuZIEkaJ"
</script>
</HEAD>
<BODY><DIV id="PageContent">
<INPUT TYPE="Button" value="<<" onClick="history.back()"><INPUT TYPE="Button" value=">>" onClick="history.forward()">
<DIV id='cmp_DEBDebtor_Banner'><!-- COMPONENT
Routine GCOM3.1
Page Name websys.default.csp
Component ID 69
Component Name DEBDebtor.Banner
websys.Component Version .51.1531 at 2015-07-14 03:14:18PM
Component Version L2010.1.1 on 2015-07-14 03:14:52PM
Layout for SYS.SYS
-->
<DIV STYLE="LEFT: 0px; TOP: 0px" id='dDEBDebtor_Banner' onclick="websys_sckeys[String.fromCharCode(113)]='websys_help(\'69\',\'-100000000000000\',\'\');'"><FORM ACTION='websys.csp' method=post name='fDEBDebtor_Banner' id='fDEBDebtor_Banner' autocomplete='off'>
<INPUT TYPE='HIDDEN' ID='TFORM' NAME='TFORM' VALUE='DEBDebtor.Banner'>
<INPUT TYPE='HIDDEN' ID='TPAGID' NAME='TPAGID' VALUE='sRi5912LJScu3'>
<INPUT TYPE='HIDDEN' ID='TEVENT' NAME='TEVENT' VALUE=''>
<INPUT TYPE='HIDDEN' ID='TXREFID' NAME='TXREFID' VALUE='3'>
<INPUT TYPE='HIDDEN' ID='TOVERRIDE' NAME='TOVERRIDE' VALUE=''>
<INPUT TYPE='HIDDEN' ID='TDIRTY' NAME='TDIRTY' VALUE='1'>
<INPUT TYPE='HIDDEN' ID='TWKFL' NAME='TWKFL' VALUE=''>
<INPUT TYPE='HIDDEN' ID='TWKFLI' NAME='TWKFLI' VALUE=''>
<INPUT TYPE='HIDDEN' ID='TFRAME' NAME='TFRAME' VALUE=''>
<INPUT TYPE='HIDDEN' ID='TWKFLL' NAME='TWKFLL' VALUE="">
<INPUT TYPE='HIDDEN' ID='TWKFLJ' NAME='TWKFLJ' VALUE="">
<INPUT TYPE='HIDDEN' ID='TREPORT' NAME='TREPORT' VALUE="">
<INPUT TYPE='HIDDEN' ID='TRELOADCMP' NAME='TRELOADCMP' VALUE="">
<INPUT TYPE='HIDDEN' ID='TRELOADID' NAME='TRELOADID' VALUE="sRi1272LJScu4">
<INPUT TYPE='HIDDEN' ID='TOVERLAY' NAME='TOVERLAY' VALUE=''>
<input id="UseSameWin" name="UseSameWin" type="hidden" value="">
<TABLE><TBODY><TR><TD></TD><TD>
<table border='1px'><tr><td><table border='1px' cellspacing='0px' width='940px'><tr><td width='105px' ><span style='font-family:Arial; font-size:10pt; color:#000000;'> Episode No.</span></td><td width='139px' style='background-color:#0000FF;'><span style='font-weight:bold; color:#FFFF00;'> PK01150438</span></td><td width='48px' ><span style='color:#000000;'> MRN</span></td><td width='192px' style='background-color:#0A246A;'><span style='font-weight:bold; color:#00FFFF;'> MRN46913203</span></td><td width='39px' ><span style='color:#000000;'> Lab</span></td><td width='398px' style='background-color:#0A246A;'><span style='font-weight:bold; color:#D4D0C8;'> Nelspruit Laboratory</span></td><td width='6px'></td></tr></table><table border='1px' cellspacing='0px' width='940px'><tr><td width='495px' ><span style='font-size:14pt; font-weight:bold; color:#0000FF;'> Unknown MALE</span></td><td width='37px' style='background-color:#0A246A;'><span style='font-size:10pt; font-weight:bold; color:#D4D0C8;'> M</span></td><td width='88px' style='background-color:#0A246A;'><span style='font-size:9pt; font-weight:bold; color:#D4D0C8;'> 27 y</span></td><td width='130px' style='background-color:#0A246A;'><span style='font-size:10pt; font-weight:bold; color:#D4D0C8;'> 01/01/1988</span></td><td width='176px' style='background-color:#0A246A;'><span style='font-weight:bold; color:#D4D0C8;'> </span></td><td width='6px'></td></tr></table><table border='1px' cellspacing='0px' width='940px'><tr><td width='418px' ><span style='font-size:8pt; color:#000000;'> Clotted blood;EDTA blood</span></td><td width='339px' ><span style='font-size:8pt; color:#000000;'> 1</span></td><td width='178px' style='background-color:#0A246A;'><span style='font-size:10pt; font-weight:bold; color:#D4D0C8;'> Routine</span></td><td width='3px'></td></tr></table><table border='1px' cellspacing='0px' width='940px'><tr><td width='40px' ><span style='font-size:10pt; color:#000000;'> F.N.</span></td><td width='154px' style='background-color:#0A246A;'><span style='font-weight:bold; color:#D4D0C8;'> 08/120456</span></td><td width='58px' ><span style='color:#000000;'> Ref No</span></td><td width='402px' style='background-color:#0A246A;'><span style='font-weight:bold; color:#D4D0C8;'> AAWC0287NOF</span></td><td width='91px' ><span style='font-size:10pt; color:#000000;'> Collection</span></td><td width='115px' style='background-color:#0A246A;'><span style='font-weight:bold; color:#D4D0C8;'> 02/07/2015</span></td><td width='54px' style='background-color:#0A246A;'><span style='font-weight:bold; color:#D4D0C8;'> 00:30</span></td><td width='5px'></td></tr></table><table border='1px' cellspacing='0px' width='940px'><tr><td width='40px' ><span style='font-size:10pt; color:#000000;'> Hosp</span></td><td width='349px' style='background-color:#0A246A;'><span style='font-weight:bold; color:#D4D0C8;'> Rob Ferreira Hospital</span></td><td width='26px' ><span style='color:#000000;'> </span></td><td width='240px' style='background-color:#0A246A;'><span style='font-weight:bold; color:#D4D0C8;'> 013 741 3031</span></td><td width='91px' ><span style='color:#000000;'> Received</span></td><td width='115px' style='background-color:#0A246A;'><span style='font-weight:bold; color:#D4D0C8;'> 02/07/2015</span></td><td width='54px' style='background-color:#0A246A;'><span style='font-weight:bold; color:#D4D0C8;'> 02:45</span></td><td width='6px'></td></tr></table><table border='1px' cellspacing='0px' width='940px'><tr><td width='40px' ><span style='font-size:10pt; color:#000000;'> </span></td><td width='349px' style='background-color:#0A246A;'><span style='font-size:10pt; font-weight:bold; color:#D4D0C8;'> Ward 4</span></td><td width='26px' ><span style='color:#000000;'> </span></td><td width='158px' style='background-color:#0A246A;'><span style='font-weight:bold; color:#D4D0C8;'> </span></td><td width='362px'></td></tr></table><table border='1px' cellspacing='0px' width='940px'><tr><td width='40px' ><span style='font-size:10pt; color:#000000;'> Doc</span></td><td width='349px' style='background-color:#0A246A;'><span style='font-weight:bold; color:#D4D0C8;'> DR IN CHARGE </span></td><td width='23px' ><span style='color:#000000;'> </span></td><td width='525px'></td></tr></table></td></tr></table></TD></TR></TBODY></TABLE>
</FORM>
<SCRIPT SRC="../scripts_gen/debdebtor.banner.js"></SCRIPT>
<SCRIPT SRC="../scripts/debdebtor.banner.js"></SCRIPT>
<SCRIPT SRC="../custom/NHLS-LABTRAK/scripts/debdebtor.banner.js"></SCRIPT>
<SCRIPT>
t['DemographicPanel']='Demographic Panel';
websys_sckeys[String.fromCharCode(113)]='websys_help(\'69\',\'-100000000000000\',\'\');';
websys_sckeys[String.fromCharCode(220)]='if (top.frames[\'eprmenu\']) top.frames[\'eprmenu\'].ToggleMenu(null);';
</SCRIPT>
</DIV>
<!-- COMPONENT END DEBDebtor.Banner -->
<SCRIPT language=javascript>
try { InitMe(); } catch(e) {};
</SCRIPT>
</DIV>
<DIV id='cmp_web_EPVisitTestSet_FullLabPreview'><!-- COMPONENT
Routine GCOM66.1
Page Name websys.default.csp
Component ID 79
Component Name web.EPVisitTestSet.FullLabPreview
websys.Component Version .51.1531 at 2015-07-14 03:14:18PM
Component Version L2010.1.1 on 2015-07-14 03:14:52PM
Layout for SYS.SYS
-->
<DIV STYLE="LEFT: 0px; TOP: 0px" id='dweb_EPVisitTestSet_FullLabPreview' onclick="websys_sckeys[String.fromCharCode(113)]='websys_help(\'79\',\'-100000000000000\',\'\');'"><FORM ACTION='websys.csp' method=post name='fweb_EPVisitTestSet_FullLabPreview' id='fweb_EPVisitTestSet_FullLabPreview' autocomplete='off'>
<INPUT TYPE='HIDDEN' ID='TFORM' NAME='TFORM' VALUE='web.EPVisitTestSet.FullLabPreview'>
<INPUT TYPE='HIDDEN' ID='TPAGID' NAME='TPAGID' VALUE='sRi12LJS61cu8'>
<INPUT TYPE='HIDDEN' ID='TEVENT' NAME='TEVENT' VALUE=''>
<INPUT TYPE='HIDDEN' ID='TXREFID' NAME='TXREFID' VALUE='66'>
<INPUT TYPE='HIDDEN' ID='TOVERRIDE' NAME='TOVERRIDE' VALUE=''>
<INPUT TYPE='HIDDEN' ID='TDIRTY' NAME='TDIRTY' VALUE='1'>
<INPUT TYPE='HIDDEN' ID='TWKFL' NAME='TWKFL' VALUE=''>
<INPUT TYPE='HIDDEN' ID='TWKFLI' NAME='TWKFLI' VALUE=''>
<INPUT TYPE='HIDDEN' ID='TFRAME' NAME='TFRAME' VALUE=''>
<INPUT TYPE='HIDDEN' ID='TWKFLL' NAME='TWKFLL' VALUE="">
<INPUT TYPE='HIDDEN' ID='TWKFLJ' NAME='TWKFLJ' VALUE="">
<INPUT TYPE='HIDDEN' ID='TREPORT' NAME='TREPORT' VALUE="">
<INPUT TYPE='HIDDEN' ID='TRELOADCMP' NAME='TRELOADCMP' VALUE="">
<INPUT TYPE='HIDDEN' ID='TRELOADID' NAME='TRELOADID' VALUE="sRi1272LJScu4">
<INPUT TYPE='HIDDEN' ID='TOVERLAY' NAME='TOVERLAY' VALUE=''>
<TABLE><TBODY><TR><TD></TD><TD colSpan=3>
<pre>
</pre>
<pre>
<span style='font-family:Courier New,monospace; font-size:9pt; '> Sodium</span>
<span style='font-family:Courier New,monospace; font-size:9pt; '> Specimen insufficient for test(s)</span>
</pre>
<pre>
</pre>
<pre>
<span style='font-family:Courier New,monospace; font-size:9pt; '> Potassium</span>
<span style='font-family:Courier New,monospace; font-size:9pt; '> Specimen insufficient for test(s)</span>
</pre>
<pre>
</pre>
<pre>
<span style='font-family:Courier New,monospace; font-size:9pt; '> Chloride</span>
<span style='font-family:Courier New,monospace; font-size:9pt; '> Specimen insufficient for test(s)</span>
</pre>
<pre>
</pre>
<pre>
<span style='font-family:Courier New,monospace; font-size:9pt; '> Bicarbonate</span>
<span style='font-family:Courier New,monospace; font-size:9pt; '> Specimen insufficient for test(s)</span>
</pre>
<pre>
</pre>
<pre>
<span style='font-family:Courier New,monospace; font-size:9pt; '> Urea</span>
<span style='font-family:Courier New,monospace; font-size:9pt; '> Specimen insufficient for test(s)</span>
</pre>
<pre>
</pre>
<pre>
<span style='color:ORANGE;font-family:Courier New,monospace; font-size:9pt; '> Authorised by xx on 04/07/2015 at 17:02</span>
<span style='font-family:Courier New,monospace; font-size:9pt; '> Creatinine</span><span style='font-weight:bold; font-family:Courier New,monospace; font-size:9pt; '> </span><span style='font-weight:bold; color:RED;font-family:Courier New,monospace; font-size:9pt; '> 116 H</span><span style='font-family:Courier New,monospace; font-size:9pt; '> umol/L 64 - 104</span>
<span style='font-family:Courier New,monospace; font-size:9pt; '> eGFR (MDRD formula) >60 mL/min/1.73 m2</span>
<span style='font-family:Courier New,monospace; font-size:9pt; '> MDRD-derived estimation of GFR may significantly underestimate true GFR</span>
<span style='font-family:Courier New,monospace; font-size:9pt; '> in patients with GFR > 60 mL/min/1.73m^2. It may also be unreliable in</span>
<span style='font-family:Courier New,monospace; font-size:9pt; '> the case of: age <18 years or >70 years; pregnancy; serious co-morbid</span>
<span style='font-family:Courier New,monospace; font-size:9pt; '> conditions; acute renal failure; extremes of body habitus/unusual diet,</span>
<span style='font-family:Courier New,monospace; font-size:9pt; '> gross oedema. The MDRD-eGFR used here does not employ an ethnic factor</span>
<span style='font-family:Courier New,monospace; font-size:9pt; '> for race.</span>
<span style='font-family:Courier New,monospace; font-size:9pt; '> </span>
</pre>
<pre>
</pre>
<pre>
<span style='color:ORANGE;font-family:Courier New,monospace; font-size:9pt; '> Authorised by xx on 04/07/2015 at 17:02</span>
<span style='font-family:Courier New,monospace; font-size:9pt; '> Calcium 2.44 mmol/L 2.15 - 2.55</span>
</pre>
<pre>
</pre>
<pre>
<span style='color:ORANGE;font-family:Courier New,monospace; font-size:9pt; '> Authorised by xx on 04/07/2015 at 17:02</span>
<span style='font-family:Courier New,monospace; font-size:9pt; '> Magnesium 0.88 mmol/L 0.63 - 1.05</span>
</pre>
<pre>
</pre>
<pre>
<span style='color:ORANGE;font-family:Courier New,monospace; font-size:9pt; '> Authorised by xx on 04/07/2015 at 17:02</span>
<span style='font-family:Courier New,monospace; font-size:9pt; '> Inorganic phosphate</span><span style='font-weight:bold; font-family:Courier New,monospace; font-size:9pt; '> </span><span style='font-weight:bold; color:RED;font-family:Courier New,monospace; font-size:9pt; '> 1.47 H</span><span style='font-family:Courier New,monospace; font-size:9pt; '> mmol/L 0.78 - 1.42</span>
</pre>
<pre>
</pre>
<pre>
<span style='color:ORANGE;font-family:Courier New,monospace; font-size:9pt; '> Authorised by xx on 04/07/2015 at 17:02</span>
<span style='font-family:Courier New,monospace; font-size:9pt; '> Total protein 77 g/L 60 - 78</span>
</pre>
<pre>
</pre>
<pre>
<span style='color:ORANGE;font-family:Courier New,monospace; font-size:9pt; '> Authorised by xx on 04/07/2015 at 17:02</span>
<span style='font-family:Courier New,monospace; font-size:9pt; '> Albumin 48 g/L 35 - 52</span>
</pre>
<pre>
</pre>
<pre>
<span style='color:ORANGE;font-family:Courier New,monospace; font-size:9pt; '> Authorised by xx on 04/07/2015 at 17:02</span>
<span style='font-family:Courier New,monospace; font-size:9pt; '> Total bilirubin 8 umol/L 5 - 21</span>
</pre>
<pre>
</pre>
<pre>
<span style='color:ORANGE;font-family:Courier New,monospace; font-size:9pt; '> Authorised by xx on 04/07/2015 at 17:02</span>
<span style='font-family:Courier New,monospace; font-size:9pt; '> Conjugated bilirubin (DBil) 1 umol/L 0 - 3</span>
</pre>
<pre>
</pre>
<pre>
<span style='font-family:Courier New,monospace; font-size:9pt; '> Alanine transaminase (ALT)</span>
<span style='font-family:Courier New,monospace; font-size:9pt; '> Specimen insufficient for test(s)</span>
</pre>
<pre>
</pre>
<pre>
<span style='color:ORANGE;font-family:Courier New,monospace; font-size:9pt; '> Authorised by xx on 04/07/2015 at 17:02</span>
<span style='font-family:Courier New,monospace; font-size:9pt; '> Aspartate transaminase (AST) 26 U/L 15 - 40</span>
</pre>
<pre>
</pre>
<pre>
<span style='color:ORANGE;font-family:Courier New,monospace; font-size:9pt; '> Authorised by xx on 04/07/2015 at 17:02</span>
<span style='font-family:Courier New,monospace; font-size:9pt; '> Alkaline phosphatase (ALP) 61 U/L 53 - 128</span>
</pre>
<pre>
</pre>
<pre>
<span style='color:ORANGE;font-family:Courier New,monospace; font-size:9pt; '> Authorised by xx on 04/07/2015 at 17:02</span>
<span style='font-family:Courier New,monospace; font-size:9pt; '> Gamma-glutamyl transferase (GGT) 18 U/L <68</span>
</pre>
<pre>
</pre>
<pre>
<span style='font-family:Courier New,monospace; font-size:9pt; '> Thyroid stimulating hormone (TSH)</span>
<span style='font-family:Courier New,monospace; font-size:9pt; '> Specimen insufficient for test(s)</span>
</pre>
<pre>
</pre>
<pre>
<span style='font-family:Courier New,monospace; font-size:9pt; '> Thyroxine (free T4)</span>
<span style='font-family:Courier New,monospace; font-size:9pt; '> Specimen insufficient for test(s)</span>
</pre>
<pre>
</pre>
<pre>
<span style='color:ORANGE;font-family:Courier New,monospace; font-size:9pt; '> Authorised by xx on 02/07/2015 at 08:23</span>
<span style='font-family:Courier New,monospace; font-size:9pt; '> White Cell Count 9.75 x 109/L 3.92 - 10.40</span>
<span style='font-family:Courier New,monospace; font-size:9pt; '> Red Cell Count 5.29 x 1012/L 4.19 - 5.85</span>
<span style='font-family:Courier New,monospace; font-size:9pt; '> Haemoglobin 15.9 g/dL 13.4 - 17.5</span>
<span style='font-family:Courier New,monospace; font-size:9pt; '> Haematocrit 0.474 L/L 0.390 - 0.510</span>
<span style='font-family:Courier New,monospace; font-size:9pt; '> MCV 89.6 fL 83.1 - 101.6</span>
<span style='font-family:Courier New,monospace; font-size:9pt; '> MCH 30.0 pg 27.8 - 34.8</span>
<span style='font-family:Courier New,monospace; font-size:9pt; '> MCHC 33.5 g/dL 33.0 - 35.0</span>
<span style='font-family:Courier New,monospace; font-size:9pt; '> RDW 13.6 % 12.1 - 16.3</span>
<span style='font-family:Courier New,monospace; font-size:9pt; '> Platelet Count 217 x 109/L 171 - 388</span>
<span style='font-family:Courier New,monospace; font-size:9pt; '> MPV 10.0 fL 7.1 - 11.0</span>
<span style='font-family:Courier New,monospace; font-size:9pt; '> Neutrophils 65.60 % 6.40 x 109/L 1.60 - 6.98 32.00 - 76.00</span>
<span style='font-family:Courier New,monospace; font-size:9pt; '> Lymphocytes 25.90 % 2.53 x 109/L 1.40 - 4.20 18.00 - 56.00</span>
<span style='font-family:Courier New,monospace; font-size:9pt; '> Monocytes 5.60 % 0.55 x 109/L 0.30 - 0.80 4.00 - 12.00</span>
<span style='font-family:Courier New,monospace; font-size:9pt; '> Eosinophils 0.50 % 0.05 x 109/L 0.00 - 0.95 0.00 - 8.00</span>
<span style='font-family:Courier New,monospace; font-size:9pt; '> Basophils 0.20 % 0.02 x 109/L 0.00 - 0.10 0.00 - 2.00</span>
<span style='font-family:Courier New,monospace; font-size:9pt; '> "Other" Cells 2.10 % 0.20 x 109/L</span>
</pre>
<INPUT TYPE="HIDDEN" NAME="UnreadTSList" VALUE="PK01150438||C012||1,PK01150438||C013||1,PK01150438||C014||1,PK01150438||C015||1,PK01150438||C017||1,PK01150438||C002||1,PK01150438||C051||1,PK01150438||C053||1,PK01150438||C054||1,PK01150438||C056||1,PK01150438||C057||1,PK01150438||C058||1,PK01150438||C059||1,PK01150438||C060||1,PK01150438||C061||1,PK01150438||C062||1,PK01150438||C063||1,PK01150438||C150||1,PK01150438||C151||1,PK01150438||H002||1"><INPUT TYPE="HIDDEN" NAME="UnviewedTSList" VALUE="PK01150438||C012||1,PK01150438||C013||1,PK01150438||C014||1,PK01150438||C015||1,PK01150438||C017||1,PK01150438||C002||1,PK01150438||C051||1,PK01150438||C053||1,PK01150438||C054||1,PK01150438||C056||1,PK01150438||C057||1,PK01150438||C058||1,PK01150438||C059||1,PK01150438||C060||1,PK01150438||C061||1,PK01150438||C062||1,PK01150438||C063||1,PK01150438||C150||1,PK01150438||C151||1,PK01150438||H002||1"></TD></TR><TR><TD></TD><TD>
<a href="#" id="MarkAllReadLink" name="MarkAllReadLink" tabIndex="1">Mark all unread as read</A>
</TD><TD></TD><TD>
<a href="#" id="MarkAllViewedLink" name="MarkAllViewedLink" tabIndex="1">Mark all unviewed as viewed</A>
</TD></TR><TR><TD></TD><TD>
<A id="PrintReport" name="PrintReport" href='#' onclick="websys_lu('websys.csp?TUID=63&TUID=28',false,'top=30,left=20,width=800,height=600');return false;" TVARS="d79iPrintReport^sRi12L63JScu6" TUID='63' tabIndex=1>Print Report</A>
</TD><TD> </TD><TD></TD></TR></TBODY></TABLE>
</FORM>
<SCRIPT language=javascript id='websysMoveJS64'>
websys_move(eval(screen.availWidth-760)/2,eval(screen.availHeight-540)/2,760,540);
</SCRIPT>
<SCRIPT SRC="../scripts_gen/web.epvisittestset.fulllabpreview.js"></SCRIPT>
<SCRIPT SRC="../scripts/web.epvisittestset.fulllabpreview.js"></SCRIPT>
<SCRIPT SRC="../custom/NHLS-LABTRAK/scripts/web.epvisittestset.fulllabpreview.js"></SCRIPT>
<SCRIPT>
t['result']='result';
t['MarkAllReadLink']='Mark all unread as read';
t['MarkAllViewedLink']='Mark all unviewed as viewed';
t['PrintReport']='Print Report';
websys_sckeys[String.fromCharCode(113)]='websys_help(\'79\',\'-100000000000000\',\'\');';
websys_sckeys[String.fromCharCode(220)]='if (top.frames[\'eprmenu\']) top.frames[\'eprmenu\'].ToggleMenu(null);';
</SCRIPT>
</DIV>
<!-- COMPONENT END web.EPVisitTestSet.FullLabPreview -->
<SCRIPT language=javascript>
try { InitMe(); } catch(e) {};
</SCRIPT>
</DIV>
</DIV></BODY>
</HTML>
答案 0 :(得分:0)
我不确定您遇到问题的原因,但是您没有显示所有代码。您要做的一件事是用tag.find_all('span')
替换标签。这是工作代码:
bs = """ (your HTML code from above) """
soup = BeautifulSoup(bs)
for tag in soup.find_all('div'):
for span in tag.find_all('span'):
print span.text
产生以下内容:
Sodium
Specimen insufficient for test(s)
Potassium
Specimen insufficient for test(s)
Chloride
Specimen insufficient for test(s)
....
答案 1 :(得分:0)
修正了它。
我正在将一个html文件输入到BeautifulSoup中。这非常有效,其中div名称是我实际想要从文档中获取的信息。
## create new beautiful soup page with last page source, element containing results
html_source = driver.find_element_by_id("dweb_EPVisitTestSet_FullLabPreview").text
soup = BeautifulSoup(html_source,'html.parser')
driver.quit()
print soup
它产生了这个输出:
Sodium
Specimen insufficient for test(s)
Potassium
Specimen insufficient for test(s)
Chloride
Specimen insufficient for test(s)
Bicarbonate
Specimen insufficient for test(s)
Urea
Specimen insufficient for test(s)
Authorised by xx on 04/07/2015 at 17:02
Creatinine 116 H umol/L 64 - 104
eGFR (MDRD formula) >60 mL/min/1.73 m2
MDRD-derived estimation of GFR may significantly underestimate true GFR
in patients with GFR > 60 mL/min/1.73m^2. It may also be unreliable in
the case of: age <18 years or >70 years; pregnancy; serious co-morbid
conditions; acute renal failure; extremes of body habitus/unusual diet,
gross oedema. The MDRD-eGFR used here does not employ an ethnic factor
for race.
Authorised by xx on 04/07/2015 at 17:02
Calcium 2.44 mmol/L 2.15 - 2.55
Authorised by xx on 04/07/2015 at 17:02
Magnesium 0.88 mmol/L 0.63 - 1.05
Authorised by xx on 04/07/2015 at 17:02
Inorganic phosphate 1.47 H mmol/L 0.78 - 1.42
Authorised by xx on 04/07/2015 at 17:02
Total protein 77 g/L 60 - 78
Authorised by xx on 04/07/2015 at 17:02
Albumin 48 g/L 35 - 52
Authorised by xx on 04/07/2015 at 17:02
Total bilirubin 8 umol/L 5 - 21
Authorised by xx on 04/07/2015 at 17:02
Conjugated bilirubin (DBil) 1 umol/L 0 - 3
Alanine transaminase (ALT)
Specimen insufficient for test(s)
Authorised by xx on 04/07/2015 at 17:02
Aspartate transaminase (AST) 26 U/L 15 - 40
Authorised by xx on 04/07/2015 at 17:02
Alkaline phosphatase (ALP) 61 U/L 53 - 128
Authorised by xx on 04/07/2015 at 17:02
Gamma-glutamyl transferase (GGT) 18 U/L <68
Thyroid stimulating hormone (TSH)
Specimen insufficient for test(s)
Thyroxine (free T4)
Specimen insufficient for test(s)
Authorised by xx on 02/07/2015 at 08:23
White Cell Count 9.75 x 109/L 3.92 - 10.40
Red Cell Count 5.29 x 1012/L 4.19 - 5.85
Haemoglobin 15.9 g/dL 13.4 - 17.5
Haematocrit 0.474 L/L 0.390 - 0.510
MCV 89.6 fL 83.1 - 101.6
MCH 30.0 pg 27.8 - 34.8
MCHC 33.5 g/dL 33.0 - 35.0
RDW 13.6 % 12.1 - 16.3
Platelet Count 217 x 109/L 171 - 388
MPV 10.0 fL 7.1 - 11.0
Neutrophils 65.60 % 6.40 x 109/L 1.60 - 6.98 32.00 - 76.00
Lymphocytes 25.90 % 2.53 x 109/L 1.40 - 4.20 18.00 - 56.00
Monocytes 5.60 % 0.55 x 109/L 0.30 - 0.80 4.00 - 12.00
Eosinophils 0.50 % 0.05 x 109/L 0.00 - 0.95 0.00 - 8.00
Basophils 0.20 % 0.02 x 109/L 0.00 - 0.10 0.00 - 2.00
"Other" Cells 2.10 % 0.20 x 109/L
Mark all unread as read Mark all unviewed as viewed
Print Report
这正是我想要的。现在只是格式化所有这些文本并在字符串数组中获取....