使用具有相同名称的多个标签的BeautifulSoup

时间:2016-10-01 12:51:22

标签: python beautifulsoup

我有以下html

<g class="1581 sqw_sv5" style="cursor: pointer;">
 <path d="M397.696,126.554C397.696,126.554,404.57504,140.2417375,404.57504,140.2417375" stroke="#ffffff" style="stroke-width: 3.6; stroke-opacity: 0.5; stroke-linecap: round; fill-opacity: 0;">
 </path>
 <path d="M397.696,126.554C397.696,126.554,404.57504,140.2417375,404.57504,140.2417375" stroke="#f95a0b" style="stroke-width: 1.2; stroke-linecap: round; fill-opacity: 0;">
 </path>

我需要在第二个路径中获取'stroke'的值。我当前的代码只是从第一个路径中提取值。

我目前正在使用

shots = soup.find_all('g')
for shot in shots:
    print(shot.path['stroke'])

返回#ffffff。我需要它返回#f95a0b

2 个答案:

答案 0 :(得分:2)

您需要使用find_all首先找到所有路径,然后提取最后一个

h = """<g class="1581 sqw_sv5" style="cursor: pointer;">
 <path d="M397.696,126.554C397.696,126.554,404.57504,140.2417375,404.57504,140.2417375" stroke="#ffffff" style="stroke-width: 3.6; stroke-opacity: 0.5; stroke-linecap: round; fill-opacity: 0;">
 </path>
 <path d="M397.696,126.554C397.696,126.554,404.57504,140.2417375,404.57504,140.2417375" stroke="#f95a0b" style="stroke-width: 1.2; stroke-linecap: round; fill-opacity: 0;">
 </path>"""
soup = BeautifulSoup(h)
shots = soup.find_all('g')
for shot in shots:
    print(shot.find_all("path", stroke=True)[-1]["stroke"]

使用shot.path['stroke']相当于使用仅返回第一条路径的shot.find("path")['stroke']

或者使用 nth-of-type 也可能会有效,具体取决于html的结构:

soup = BeautifulSoup(h)
shots = soup.find_all('g')
for shot in shots:
    print(shot.select_one("path:nth-of-type(2)")["stroke"])

答案 1 :(得分:1)

这是我对你的问题的解决方案。我的回答是,它可能过于具体。这仅在style的值始终为"stroke-width: 1.2; stroke-linecap: round; fill-opacity: 0;"并且整个文档中只有一个此类path元素时才有效。

这个解决方案背后的想法是通过查找包含所需属性的所需元素的唯一内容来快速缩小元素范围。

`
from bs4 import BeautifulSoup

html = """"<g class="1581 sqw_sv5" style="cursor: pointer;">
 <path d="M397.696,126.554C397.696,126.554,404.57504,140.2417375,404.57504,140.2417375" stroke="#ffffff" style="stroke-width: 3.6; stroke-opacity: 0.5; stroke-linecap: round; fill-opacity: 0;">
 </path>
 <path d="M397.696,126.554C397.696,126.554,404.57504,140.2417375,404.57504,140.2417375" stroke="#f95a0b" style="stroke-width: 1.2; stroke-linecap: round; fill-opacity: 0;">
 </path>"""

soup = BeautifulSoup(html, "html.parser")
# get the desired 'path' element using the 'style' that identifies it
desired_element =  soup.find("path", {"style" : "stroke-width: 1.2; stroke-linecap: round; fill-opacity: 0;"})
# get the attribute value from the extracted element
desired_attribute = desired_element["stroke"]
print (desired_attribute)
# prints #f95a0b
`

如果这种方法不合适,那么您可能必须使用BeautifulSoups的next_siblingfindNext方法。基本上寻找你目前用你的代码完成的第一个路径元素,然后从那里“跳转”到下一个包含你需要的路径元素。

findNext:Beautifulsoup - nextSibling

next_sibling:https://www.crummy.com/software/BeautifulSoup/bs4/doc/#next-sibling-and-previous-sibling