Question

我正在使用以下find_all()表达式来获取所有NavigableStrings，按正常流程排序。

all_nav_strings = [x for x in node.find_all(text=True) if x.strip() != "" if not type(x) is bs4.Comment]

我想调整find_all()表达式以查找所有图像（按正常流程顺序）。

我试过了 find_all([text = True, img = True])

Answer 1

如果node中的所有元素都是图像或包含任何文本，则应该获取所有元素：

all_nav_strings = [ 
    tag for tag in node.find_all() 
    if (tag.text.strip() or tag.name == 'img') and not type(tag) is bs4.Comment 
]

使用lambda中find_all的更优雅的解决方案（如果你喜欢lambdas）：

all_nav_strings = node.find_all(
    lambda tag: (tag.text.strip() or tag.name == 'img') and type(tag) is not bs4.Comment
)

如果您不希望recursive=False获取find_all

中的所有子标记，请使用tag