如何获取先前的“ div”(“ div”略高于当前“ div”)

时间:2019-05-21 13:52:05

标签: python web-scraping beautifulsoup

我的HTML如下

<div style="TEXT-ALIGN: left; TEXT-INDENT: 0pt; DISPLAY: block; MARGIN-LEFT: 0pt; MARGIN-RIGHT: 0pt">
<div style="TEXT-INDENT: 0pt; DISPLAY: block; MARGIN-LEFT: 0pt; MARGIN-RIGHT: 0pt" align="center"><font style="DISPLAY: inline; FONT-FAMILY: Times New Roman; FONT-SIZE: 10pt; FONT-WEIGHT: bold"><font style="BACKGROUND-COLOR: #ffffff; DISPLAY: inline">PART II</font></font></div>

<div style="TEXT-INDENT: 0pt; DISPLAY: block; MARGIN-LEFT: 0pt; MARGIN-RIGHT: 0pt" align="justify">&nbsp;</div>

<div style="TEXT-INDENT: 0pt; DISPLAY: block; MARGIN-LEFT: 0pt; MARGIN-RIGHT: 0pt" align="left"><font style="DISPLAY: inline; FONT-FAMILY: Times New Roman; FONT-SIZE: 10pt; FONT-WEIGHT: bold"><font style="BACKGROUND-COLOR: #ffffff; DISPLAY: inline">ITEM 5. MARKET FOR REGISTRANT’S COMMON EQUITY AND RELATED STOCKHOLDER MATTERS.</font></font></div>

<div style="TEXT-INDENT: 0pt; DISPLAY: block; MARGIN-LEFT: 0pt; MARGIN-RIGHT: 0pt" align="justify">&nbsp;</div>

<div style="TEXT-INDENT: 0pt; DISPLAY: block; MARGIN-LEFT: 0pt; MARGIN-RIGHT: 0pt" align="left"><font style="DISPLAY: inline; FONT-FAMILY: Times New Roman; FONT-SIZE: 10pt"><font style="BACKGROUND-COLOR: #ffffff; DISPLAY: inline">Our common stock is quoted on the OTCBB under the symbol UOIP. The reported high and low closing prices for the common stock as reported on the OTCBB are shown below for the periods indicated. The quotations reflect inter-dealer prices, without retail mark-up, markdown or commission, and may not represent actual transactions.</font></font></div>

<div style="TEXT-INDENT: 0pt; DISPLAY: block; MARGIN-LEFT: 0pt; MARGIN-RIGHT: 0pt" align="left">&nbsp;</div>

<div align="left">
<table cellpadding="0" cellspacing="0" width="100%" style="FONT-FAMILY: times new roman; FONT-SIZE: 10pt; FONT-SIZE: 10pt; FONT-FAMILY: times new roman">
<tbody><tr>
<td valign="bottom" style="PADDING-BOTTOM: 2px"><font style="DISPLAY: inline; FONT-FAMILY: times new roman; FONT-SIZE: 10pt">&nbsp; </font></td>
<td valign="bottom" style="PADDING-BOTTOM: 2px"><font style="DISPLAY: inline; FONT-FAMILY: times new roman; FONT-SIZE: 10pt; FONT-WEIGHT: bold">&nbsp;</font></td>
<td colspan="2" valign="bottom" style="BORDER-BOTTOM: black 2px solid">
<div style="TEXT-INDENT: 0pt; DISPLAY: block; MARGIN-LEFT: 0pt; MARGIN-RIGHT: 0pt" align="center"><font style="DISPLAY: inline; FONT-FAMILY: times new roman; FONT-SIZE: 8pt; FONT-WEIGHT: bold">High</font></div>
</td>
<td nowrap="" valign="bottom" style="TEXT-ALIGN: left; PADDING-BOTTOM: 2px"><font style="DISPLAY: inline; FONT-FAMILY: times new roman; FONT-SIZE: 8pt; FONT-WEIGHT: bold">&nbsp;</font></td>
<td valign="bottom" style="PADDING-BOTTOM: 2px"><font style="DISPLAY: inline; FONT-FAMILY: times new roman; FONT-SIZE: 8pt; FONT-WEIGHT: bold">&nbsp;</font></td>
<td colspan="2" valign="bottom" style="BORDER-BOTTOM: black 2px solid">
<div style="TEXT-INDENT: 0pt; DISPLAY: block; MARGIN-LEFT: 0pt; MARGIN-RIGHT: 0pt" align="center"><font style="DISPLAY: inline; FONT-FAMILY: times new roman; FONT-SIZE: 8pt; FONT-WEIGHT: bold">Low</font></div>
</td>
<td nowrap="" valign="bottom" style="TEXT-ALIGN: left; PADDING-BOTTOM: 2px"><font style="DISPLAY: inline; FONT-FAMILY: times new roman; FONT-SIZE: 10pt; FONT-WEIGHT: bold">&nbsp;</font></td>
</tr><tr>
<td align="left" valign="bottom">
<div style="TEXT-INDENT: 0pt; DISPLAY: block; MARGIN-LEFT: 0pt; MARGIN-RIGHT: 0pt" align="left"><font style="DISPLAY: inline; FONT-FAMILY: times new roman; FONT-SIZE: 10pt; FONT-WEIGHT: bold">Nine months ended June 30, 2014</font></div>
</td>
<td align="left" valign="bottom"><font style="DISPLAY: inline; FONT-FAMILY: times new roman; FONT-SIZE: 10pt">&nbsp;</font></td>
<td align="left" colspan="2" valign="bottom"><font style="DISPLAY: inline; FONT-FAMILY: times new roman; FONT-SIZE: 10pt">&nbsp;</font></td>
<td nowrap="" valign="bottom" style="TEXT-ALIGN: left"><font style="DISPLAY: inline; FONT-FAMILY: times new roman; FONT-SIZE: 10pt">&nbsp;</font></td>
<td align="left" valign="bottom"><font style="DISPLAY: inline; FONT-FAMILY: times new roman; FONT-SIZE: 10pt">&nbsp;</font></td>
<td align="left" colspan="2" valign="bottom"><font style="DISPLAY: inline; FONT-FAMILY: times new roman; FONT-SIZE: 10pt">&nbsp;</font></td>
<td nowrap="" valign="bottom" style="TEXT-ALIGN: left"><font style="DISPLAY: inline; FONT-FAMILY: times new roman; FONT-SIZE: 10pt">&nbsp;</font></td>
</tr><tr bgcolor="#cceeff">
<td align="left" valign="bottom" width="76%">
<div style="TEXT-INDENT: 0pt; DISPLAY: block; MARGIN-LEFT: 0pt; MARGIN-RIGHT: 0pt" align="left"><font style="DISPLAY: inline; FONT-FAMILY: times new roman; FONT-SIZE: 10pt">First quarter ended December 31, 2013</font></div>
</td>
<td align="right" valign="bottom" width="1%"><font style="DISPLAY: inline; FONT-FAMILY: times new roman; FONT-SIZE: 10pt">&nbsp;</font></td>
<td valign="bottom" width="1%" style="TEXT-ALIGN: left"><font style="DISPLAY: inline; FONT-FAMILY: times new roman; FONT-SIZE: 10pt">$</font></td>
<td valign="bottom" width="9%" style="TEXT-ALIGN: right"><font style="DISPLAY: inline; FONT-FAMILY: times new roman; FONT-SIZE: 10pt">12.4000</font></td>
<td nowrap="" valign="bottom" width="1%" style="TEXT-ALIGN: left"><font style="DISPLAY: inline; FONT-FAMILY: times new roman; FONT-SIZE: 10pt">&nbsp;</font></td>
<td align="right" valign="bottom" width="1%"><font style="DISPLAY: inline; FONT-FAMILY: times new roman; FONT-SIZE: 10pt">&nbsp;</font></td>
<td valign="bottom" width="1%" style="TEXT-ALIGN: left"><font style="DISPLAY: inline; FONT-FAMILY: times new roman; FONT-SIZE: 10pt">$</font></td>
<td valign="bottom" width="9%" style="TEXT-ALIGN: right"><font style="DISPLAY: inline; FONT-FAMILY: times new roman; FONT-SIZE: 10pt">3.6400</font></td>
<td nowrap="" valign="bottom" width="1%" style="TEXT-ALIGN: left"><font style="DISPLAY: inline; FONT-FAMILY: times new roman; FONT-SIZE: 10pt">&nbsp;</font></td>
</tr>

我已使用find_all('table')找到了表格。我如何获得任何以前的div包含文字(或任何数据/非空白)的文字。 (在具有表格的div之前的一个)

我尝试了find_previous('div'),但是没有用。请帮忙。

1 个答案:

答案 0 :(得分:1)

UUID

soup = bs4.BeautifulSoup(html, 'html.parser')
table = soup.find('table')
prev_div = table.find_previous('div').find_previous('div').find_previous('div').text

输出:

soup = bs4.BeautifulSoup(html, 'html.parser')
table = soup.find_all('table')
prev_div = table[0].find_previous('div').find_previous('div').find_previous('div').text