应用错误收集

我编写了一个函数，根据标题的标签（h1 / 2 ...）解析所有标题。现在我想扩展它并添加一个功能，根据字体大小解析文本 - 比如20px或1.5em，无论标题如何。我想要一个功能，它将任何以大于X的字体大小的文本写在页面上的任何位置。该函数将json文件作为输入，由随机HTML（以及任何网站可能具有的内容，即CSS等）组成。

基于crummy，似乎有一种可能的选择是使用soup.fetch（），但是，我没有找到很多使用它的例子。

由于字体大小可能会出现在CSS组件下，我不确定bs4是否适合它。我认为答案包括cssutils或tinycss，但未能找到将此用途用于此任务的最佳方式。

作为参考 - 我发布了标题标记的代码以供审核：https://codereview.stackexchange.com/questions/166671/extract-html-content-based-on-tags-specifically-headers/166674?noredirect=1#comment317280_166674。

我检查的帖子： What is the pythonic way to implement a css parser/replacer;
Find all the span styles with font size larger than the most common one via beautiful soup python;
Search in HTML page using Regex patterns with python;
How to parse a web page containing CSS and HTML using python;
how to extract text within font tag using beautifulsoup;
Extract text with bold content from css selector

非常感谢，

根据font-size从CSS中提取文本

0 个答案: