我正试图从https://www.opentable.sg/singapore-restaurants
抓取餐馆名称url = "https://www.opentable.sg/singapore-restaurants"
html = urllib.urlopen(url).read()
soup = BeautifulSoup(html, 'html.parser', from_encoding="utf-8")
for entry in soup.find_all('a',{'class':'rest-row-name'}):
#print entry.renderContents()
print entry
输出:
<a class="rest-row-name" href="/r/chilis-clarke-quay-central-singapore">Chili's Clarke Quay Central</a>
<a class="rest-row-name" href="/r/atlas-singapore">ATLAS</a>
<a class="rest-row-name" href="/r/edge-pan-pacific-singapore-marina-
square">Edge - Pan Pacific Singapore</a>
<a class="rest-row-name" href="/r/lawrys-the-prime-rib-
singapore">Lawry's The Prime Rib Singapore</a>
<a class="rest-row-name" href="//www.opentable.sg/r/carousel-royal-
plaza-on-scotts" target="_blank"><span class="rest-row-index">1.
</span>Carousel</a>
<a class="rest-row-name" href="//www.opentable.sg/r/bread-street-
kitchen-marina-bay-sands-singapore" target="_blank"><span class="rest-
row-index">2. </span>Bread Street Kitchen - Marina Bay Sands</a>
<a class="rest-row-name" href="//www.opentable.sg/r/colony-the-ritz-
carlton-millenia-singapore" target="_blank"><span class="rest-row-
index">3. </span>Colony - The Ritz-Carlton Millenia Singapore</a>
<a class="rest-row-name" href="//www.opentable.sg/r/edge-pan-pacific-
singapore-marina-square" target="_blank"><span class="rest-row-
index">4. </span>Edge - Pan Pacific Singapore</a>
<a class="rest-row-name" href="//www.opentable.sg/r/the-dempsey-
cookhouse-and-bar-singapore" target="_blank"><span class="rest-row-
index">5. </span>The Dempsey Cookhouse and Bar</a>
<a class="rest-row-name" href="//www.opentable.sg/r/the-westin-
singapore-cook-and-brew-singapore" target="_blank">Cook & Brew -
The Westin Singapore</a>
<a class="rest-row-name" href="//www.opentable.sg/r/pince-and-pints-
katong-singapore" target="_blank">Pince & Pints Katong
Singapore</a>
<a class="rest-row-name" href="//www.opentable.sg/r/seasonal-tastes-
the-westin-singapore" target="_blank">Seasonal Tastes - The Westin
Singapore</a>
<a class="rest-row-name" href="//www.opentable.sg/r/sky22-"
target="_blank">Sky22</a>
<a class="rest-row-name" href="//www.opentable.sg/r/the-chop-house-
katong-singapore" target="_blank">The Chop House Katong</a>
当我想在我的汤对象上使用.renderContents()时,这就是返回的内容:
for entry in soup.find_all('a',{'class':'rest-row-name'}):
print entry.renderContents()
输出:
Chili's Clarke Quay Central
ATLAS
Edge - Pan Pacific Singapore
Lawry's The Prime Rib Singapore
<span class="rest-row-index">1. </span>Carousel
<span class="rest-row-index">2. </span>Bread Street Kitchen - Marina
Bay Sands
<span class="rest-row-index">3. </span>Colony - The Ritz-Carlton
Millenia Singapore
<span class="rest-row-index">4. </span>Edge - Pan Pacific Singapore
<span class="rest-row-index">5. </span>The Dempsey Cookhouse and Bar
Cook & Brew - The Westin Singapore
Pince & Pints Katong Singapore
Seasonal Tastes - The Westin Singapore
Sky22
The Chop House Katong
我希望在使用.renderContents()时只返回餐馆名称。但是因为有些餐馆有不同的类标签,所以有一些条目还有html,我没有提取餐馆名称。
处理这种情况的最佳做法是什么?我该怎么办?