Python - 如何使用BeautifulSoup在另一个类中定位一个类?

时间:2015-08-17 15:24:21

标签: python beautifulsoup web-crawler

我正在学习使用beautifulsoup和Python 3创建一个爬虫,我遇到了一个问题,我想在一个网站上获得的数据有多个类,这里​​有一个例子:

<tr class="phone">
  <a href="..." class="number"></a>
</tr> 

<tr class="mobile">
  <a href="..." class="number"></a>
</tr> 

以下是我想用Python做的事情:

for num in soup.findAll('a', {'class':'mobile -> number'}):
    print(num.string)

我应该如何定位课程.mobile .number

2 个答案:

答案 0 :(得分:2)

您可以使用soup.select根据CSS selector找到项目。

from bs4 import BeautifulSoup


html_doc = '''<tr class="phone">
  <a href="tel:+18005551212" class="number"></a>
</tr> 

<tr class="mobile">
  <a href="+13034997111" class="number"></a>
</tr> '''

soup = BeautifulSoup(html_doc)

# Find any tag with a class of "number"
# that is a descendant of a tag with
# a class of "mobile"
mobiles = soup.select(".mobile .number")
print mobiles

# Find a tag with a class of "number"
# that is an immediate descendent
# of a tag with "mobile"
mobiles = soup.select(".mobile > .number")
print mobiles

# Find an <a class=number> tag that is an immediate
# descendent of a <tr class=mobile> tag.
mobiles = soup.select("tr.mobile > a.number")
print mobiles

答案 1 :(得分:1)

带有“number”类的

find_all()个元素,然后遍历列表并打印parent的类为“mobile”的那个元素。

for dom in soup.find_all("a", "number"):
    # this returns a list of class names
    for class in dom.parent()["class"]:     
    if class == "mobile":
        print(dom.string)

或使用select()作为CSS选择器样式

for dom in soup.select("tr.mobile a.number"):
    print(dom.string)