Question

说我有这个html代码：

html = """
<div non_class="first"></div>
<h2 style="some_style"> Text 1</h2>
<div non_class="second"></div>
<div non_class="first">Text 2</div>
"""

使用此代码：

from bs4 import BeautifulSoup as bs
soup = bs(html,'lxml')

我将两个参数（一个标记和一个属性/属性值对）传递给soup.find_all()：

first = soup.find_all('div',non_class='first')
for i in first:
    print(i)

将输出：

<div non_class="first"></div>
<div non_class="first">Text 2</div>

足够简单。现在让我们说，与其将这些参数硬连接起来，不如将它们作为变量传递给find_all()。基于问题such as this，this，or this，我使用了这种方法：

my_tag = 'div'
my_att = {'non_class': 'first'}

second = soup.find_all(my_tag,my_att)
for i in second:
    print(i)

它会产生正确的输出。但这远不能令人满意。我的“目标”标签是<div non_class="first">，并且（如果一切正常）它将是我打算在for循环中使用的目标列表中的一个条目。但是，这些答案中提出的方法要求（除非有人有更好的方法！），我将目标划分为多个组成部分：首先是一个标记（在此示例中为div），然后采用属性/属性值配对（在此示例中为non_class="first"）并将其转换为字典（{'non_class': 'first'}），然后将这两个供稿到find_all(_)中。这是可行的，但是不优雅。

所以我尝试使用一个变量传递整套参数，但是

target = '<div non_class="first">'

third = soup.find_all(target)

什么也没找到。使用f字符串填充目标：

fourth = soup.find_all(f'{target}')

也失败了。

编辑：为澄清起见，本练习的目的是将元素供入find_all() 而无需，而不必先将其分解为手动或使用a的组成部分辅助功能。从概念上讲，我想我不明白为什么find_all()可以直接将元素用作字符串参数，但是如果将字符串分配给变量，find_all()就不能采用该变量并重新构成作为字符串参数...

那么这是可行的，还是我必须辞职去对目标进行切片和切块？或者，可以用硒吗？

Answer 1

有很多方法可以提取数据。如果我能正确理解用例，则以下选项可能会对您有所帮助。

html = """
<div non_class="first"></div>
<h2 style="some_style"> Text 1</h2>
<div non_class="second"></div>
<div non_class="first">Text 2</div>
"""
from bs4 import BeautifulSoup
soup = BeautifulSoup(html,'lxml')


print(soup.find_all(non_class="first"))

find_element = lambda target,soup : soup.find_all(target['tag'],{target['attribute']:target['value']})
target = {'tag':'div','attribute':'non_class','value':'first'}
print(find_element(target,soup))

target = {'non_class': 'first'}
print(soup.find_all(attrs=target))

print(soup.find_all(non_class="first"))

即使您可以实现以下类似的操作，也将html标签作为字符串并返回目标值。

def get_element(selector_string,soup):
    element = BeautifulSoup(selector_string,'lxml').body.next
    return soup.find_all(element.name,element.attrs)

print(get_element('<div non_class="first">',soup))

如何将一组参数作为一个长变量传递给find（）/ find_all（）

1 个答案: