如何使用Beautiful Soup从HTML中提取C注释?

时间:2017-06-12 17:14:53

标签: python web-scraping beautifulsoup

url= raw_input('Please enter a URL to an HTML doc or File Path to a saved HTML doc for analysis: ')
if os.path.isfile(url) == False:
    page = requests.get(url)
    soup1 = BeautifulSoup(page.content, 'html.parser')
    comments = soup1.find_all(string=lambda text: isinstance(text, '/* '))
    for c in comments:
        print c
        print "==========="
        #c.decompose()

如果给出了预期网站的URL,该程序将从HTML文件中提取所有注释。现在我知道Beautiful Soup只在考虑要提取什么时才使用元组,但究竟如何将/*的所有实例从文件中拉出来?

0 个答案:

没有答案