Question

<div class="question_text_edit">
   <h3>This is a heading</h3>
   <p>This is a paragraph.</p>
</div>

我想在python中以“字符串”格式提取纯html代码，以便在HTMLTOTEXT函数中传递它。我只需要孩子。我在python中使用selenium。

Answer 1

我根据你的例子假设你想要的是一个像这样的字符串：

html_string = '<h3>This is a heading</h3><p>This is a paragraph</p>'

如果您想使用纯硒，请尝试以下方法：

""" Create your webdriver as 'driver' and then begin here """

parent_el = driver.find_element_by_class_name('question_text_edit')
children = parent_el.find_elements_by_css_selector('*')

html_string = ''.join([child.get_attribute('outerHTML') for child in children])

现在html_string应该包含您的HTML。

说明：

find_element_by_css_selector('*')可能会非常慢，具体取决于您要解析的HTML。可能有另一种方法可以实现您的总体目标，而不是以这种方式完成。

Answer 2

尝试BeautifulSoup

soup = BeautifulSoup('<h3>This is a heading</h3>')
tagname=soup.h3
print(tagname.string)

它将返回This is a heading

使用Python

2 个答案:

说明：