Python beautifulsoup从当前标签获取文本而不是从儿童获取

时间:2014-12-23 13:52:02

标签: python beautifulsoup html-parsing

from bs4 import BeautifulSoup

html = "<html>
        <head>
        </head>
        <body>
            <p class='a'>
               a        #(Text i want)
               <a>anchor</a>
            </p>
            <p class='b'>b</p>
        </body>
        </html>"

html = "<html>
        <head>
        </head>
        <body>
            <p class='a'>
               <a>anchor</a>
               a        #(Text i want)  (New Order)
            </p>
            <p class='b'>b</p>
        </body>
        </html>"


doml = BeautifulSoup(html)
tag = "p"
selector = "class"
selector_str = "a"
result = dom.find(tag, {selector: selector_str}, recursive=False, text=True)
print(result)    #result = None

这是我迄今为止所尝试过的。我如何获得 a 类的唯一文本。

我无法使用内容方法:

dom = BeautifulSoup(html)
tag = "p"
selector = "class"
selector_str = "a"
result = dom.find(tag, {selector: selector_str})
print(result.contents[0])  #prints(a)

因为文本可以是任何顺序。所以任何人都知道如何在不使用内容方法的情况下完成此任务?

0 个答案:

没有答案