当我使用soup.find("h3", text="Main Address:").find_parents("section")
时,我得到的输出是:
[<section class="otlnrw" itemscope="" itemtype="http://microformats.org/wiki/hCard">\n<header>\n<h3 i
temprop="name">Main Address:</h3>\n</header>\n<p>600 Dexter <abbr title="Avenue\r"><abbr title="Avenu
e\r">Ave.</abbr></abbr><br/><span class="locality">Montgomery</span>, <span class="region">AL</span>,
<span class="postal-code">36104</span></p> </section>]
现在我想只打印段落的文字。我无法做到这一点。请告诉我如何从这里只打印本节段落内的文字。
或者我的HTML页面是这样的:
<article>
<header>
<h2 id="state-government">State Government</h2>
</header>
<section itemscope="" itemtype="http://microformats.org/wiki/hCard" class="otln">
<header><h3 itemprop="name">Official Name:</h3></header>
<p><a href="http://alaska.gov/">Alaska</a>
</p>
</section>
<section itemscope="" itemtype="http://microformats.org/wiki/hCard" class="otlnrw">
<header><h3 class="org">Governor:</h3></header>
<p><a href="http://gov.alaska.gov/Walker/contact/email-the-governor.html">Bill Walker</a></p>
</section>
<section itemscope="" itemtype="http://microformats.org/wiki/hCard" class="otln">
<header><h3 itemprop="name">Main Address:</h3></header>
<p>120 East 4th Street<br>
<span class="locality">Juneau</span>,
<span class="region">AK</span>,
<span class="postal-code">99801</span></p>
</section>
<section itemscope="" itemtype="http://microformats.org/wiki/hCard" class="otlnrw">
<header><h3 itemprop="name">Phone Number:</h3></header>
<p class="spk tel">907-465-3708</p>
</section>
<p class="volver clearfix"><a href="#skiptarget">
<span class="icon-backtotop-dwnlvl">Back to Top</span></a></p>
<section>
<header><h2 id="state-agencies">State Agencies</h2></header>
<ul>
<li><a href="/state-consumer/alaska">Consumer Protection Offices</a></li>
<li><a href="http://www.correct.state.ak.us/">Corrections Department</a></li>
<li><a href="http://www.elections.alaska.gov/">Election Office</a></li>
<li><a href="http://doa.alaska.gov/dmv/">Motor Vehicle Offices</a></li>
<li><a href="http://doa.alaska.gov/dgs/property/">Surplus Property Sales</a></li>
<li><a href="http://www.travelalaska.com">Travel and Tourism</a></li>
</ul>
</section>
<p class="volver clearfix"><a href="#skiptarget">
<span class="icon-backtotop-dwnlvl">Back to Top</span></a></p>
</article>
我应该如何从中获取地址文本。
答案 0 :(得分:0)
您当前的代码返回包含一个元素的列表。要获取其中的<p>
元素,您可以稍微扩展一下:
soup.find("h3", text="Main Address:").find_parents("section")[0]("p")
如果你想获得p元素中的内容,你必须再次获取该列表的第一个元素,并在其上运行decode_contents:
soup.find("h3", text="Main Address:").find_parents("section")[0]("p")[0].decode_contents(formatter="html")
在您的情况下将返回:
u'120 East 4th Street<br/><span class="locality">Juneau</span>, <span class="region">AK</span>, <span class="postal-code">99801</span>'