BeautifulSoup renderContents没有返回我期望的

时间:2017-11-28 09:50:58

标签: python beautifulsoup

我正试图从https://www.opentable.sg/singapore-restaurants

抓取餐馆名称
url = "https://www.opentable.sg/singapore-restaurants"
html = urllib.urlopen(url).read()
soup = BeautifulSoup(html, 'html.parser', from_encoding="utf-8")


for entry in soup.find_all('a',{'class':'rest-row-name'}):
    #print entry.renderContents()
    print entry

输出:

<a class="rest-row-name" href="/r/chilis-clarke-quay-central-singapore">Chili's Clarke Quay Central</a>

<a class="rest-row-name" href="/r/atlas-singapore">ATLAS</a>
<a class="rest-row-name" href="/r/edge-pan-pacific-singapore-marina-
square">Edge - Pan Pacific Singapore</a>
<a class="rest-row-name" href="/r/lawrys-the-prime-rib-
singapore">Lawry's The Prime Rib Singapore</a>
<a class="rest-row-name" href="//www.opentable.sg/r/carousel-royal-
plaza-on-scotts" target="_blank"><span class="rest-row-index">1. 
</span>Carousel</a>
<a class="rest-row-name" href="//www.opentable.sg/r/bread-street-
kitchen-marina-bay-sands-singapore" target="_blank"><span class="rest-
row-index">2. </span>Bread Street Kitchen - Marina Bay Sands</a>
<a class="rest-row-name" href="//www.opentable.sg/r/colony-the-ritz-
carlton-millenia-singapore" target="_blank"><span class="rest-row-
index">3. </span>Colony - The Ritz-Carlton Millenia Singapore</a>
<a class="rest-row-name" href="//www.opentable.sg/r/edge-pan-pacific-
singapore-marina-square" target="_blank"><span class="rest-row-
index">4. </span>Edge - Pan Pacific Singapore</a>
<a class="rest-row-name" href="//www.opentable.sg/r/the-dempsey-
cookhouse-and-bar-singapore" target="_blank"><span class="rest-row-
index">5. </span>The Dempsey Cookhouse and Bar</a>
<a class="rest-row-name" href="//www.opentable.sg/r/the-westin-
singapore-cook-and-brew-singapore" target="_blank">Cook &amp; Brew - 
The Westin Singapore</a>
<a class="rest-row-name" href="//www.opentable.sg/r/pince-and-pints-
katong-singapore" target="_blank">Pince &amp; Pints Katong 
Singapore</a>
<a class="rest-row-name" href="//www.opentable.sg/r/seasonal-tastes-
the-westin-singapore" target="_blank">Seasonal Tastes  - The Westin 
Singapore</a>
<a class="rest-row-name" href="//www.opentable.sg/r/sky22-" 
target="_blank">Sky22</a>
<a class="rest-row-name" href="//www.opentable.sg/r/the-chop-house-
katong-singapore" target="_blank">The Chop House Katong</a>

当我想在我的汤对象上使用.renderContents()时,这就是返回的内容:

for entry in soup.find_all('a',{'class':'rest-row-name'}):
    print entry.renderContents()

输出:

Chili's Clarke Quay Central
ATLAS
Edge - Pan Pacific Singapore
Lawry's The Prime Rib Singapore
<span class="rest-row-index">1. </span>Carousel
<span class="rest-row-index">2. </span>Bread Street Kitchen - Marina 
Bay Sands
<span class="rest-row-index">3. </span>Colony - The Ritz-Carlton 
Millenia Singapore
<span class="rest-row-index">4. </span>Edge - Pan Pacific Singapore
<span class="rest-row-index">5. </span>The Dempsey Cookhouse and Bar
Cook &amp; Brew - The Westin Singapore
Pince &amp; Pints Katong Singapore
Seasonal Tastes  - The Westin Singapore
Sky22
The Chop House Katong

我希望在使用.renderContents()时只返回餐馆名称。但是因为有些餐馆有不同的类标签,所以有一些条目还有html,我没有提取餐馆名称。

处理这种情况的最佳做法是什么?我该怎么办?

0 个答案:

没有答案