url= raw_input('Please enter a URL to an HTML doc or File Path to a saved HTML doc for analysis: ')
if os.path.isfile(url) == False:
page = requests.get(url)
soup1 = BeautifulSoup(page.content, 'html.parser')
comments = soup1.find_all(string=lambda text: isinstance(text, '/* '))
for c in comments:
print c
print "==========="
#c.decompose()
如果给出了预期网站的URL,该程序将从HTML文件中提取所有注释。现在我知道Beautiful Soup只在考虑要提取什么时才使用元组,但究竟如何将/*
的所有实例从文件中拉出来?