我在源文件的末尾有一条HTML注释。
<!-- FEO DEBUG OUTPUT [TextTransAttempted:RENAME_JAVASCRIPT(18), RENAME_IMAGE(7), MINIFY_JAVASCRIPT(25), (1), JAVASCRIPT_HTML5_CACHE(19), EMBED_JAVASCRIPT(1), RENAME_CSS(3), (1), IMAGE_COMPRESSION(7), RESPONSIVE_IMAGES(6), ASYNC_JAVASCRIPT(2);TextTransApplied:RENAME_JAVASCRIPT(18), RENAME_IMAGE(7), MINIFY_JAVASCRIPT(25), (1), JAVASCRIPT_HTML5_CACHE(19), EMBED_JAVASCRIPT(1), RENAME_CSS(3), (1), IMAGE_COMPRESSION(7), RESPONSIVE_IMAGES(6), ASYNC_JAVASCRIPT(2);TagTransAttempted:(8), ASYNC_JAVASCRIPT(61);TagTransFailed:ASYNC_JAVASCRIPT(42);TagTransApplied:(8), ASYNC_JAVASCRIPT(19); ] -->
现在我想检查括号中的所有内容是否大于零。例如,我想从RENAME_JAVASCRIPT获取值18,并检查它是否大于零,其余部分也是如此。由于这是一个注释而不是任何html标签的一部分,所以BeautifulSoup中有没有办法实现这一点。
答案 0 :(得分:0)
我只想用re:
import re
from bs4 import BeautifulSoup
with open("/sample_html.txt") as f:
soup = BeautifulSoup(f.read())
tag = soup.find("html").next_sibling
print(all( x > 0 for x in map(int,re.findall("\((\d+)\)",tag))))
True
如果你想看到名字:
from bs4 import BeautifulSoup
with open("/sample_html.txt") as f:
soup = BeautifulSoup(f.read())
tag = soup.find("html").next_sibling
for ele in re.findall("\w+\(\d+\)",tag):
if int(ele.split("(")[1].rstrip(")")) > 0:
print(ele)
RENAME_JAVASCRIPT(18)
RENAME_IMAGE(7)
MINIFY_JAVASCRIPT(25)
JAVASCRIPT_HTML5_CACHE(19)
EMBED_JAVASCRIPT(1)
RENAME_CSS(3)
IMAGE_COMPRESSION(7)
RESPONSIVE_IMAGES(6)
ASYNC_JAVASCRIPT(2)
RENAME_JAVASCRIPT(18)
RENAME_IMAGE(7)
MINIFY_JAVASCRIPT(25)
JAVASCRIPT_HTML5_CACHE(19)
EMBED_JAVASCRIPT(1)
RENAME_CSS(3)
IMAGE_COMPRESSION(7)
RESPONSIVE_IMAGES(6)
ASYNC_JAVASCRIPT(2)
ASYNC_JAVASCRIPT(61)
ASYNC_JAVASCRIPT(42)
ASYNC_JAVASCRIPT(19)