Question

我想从HTML文件中提取特定的红色项目（10个文件）。例如，我在html文件中有一个代码：

Function A()
{
if ---- "Which is red color"
    {
        Print "Hello" 
    }
else-if
    {
        print "World"
    }
 } "End of function A"

Function B ()
{
    if
    {
        Print "Hello" 
    }
    else-if  ---- "Which is red color"
    {
        print "World"
    }
} "End of function B"'

HTML格式为：

<html>
<!-- This file was generated by ApiDoc++ 2.0 -->
<!-- please do not modify this file -->
<head><meta content="text/html; charset=utf-8" http-equiv="content-type"/><title>Sample.html</title></head>
<body>
<br/>
Function <font color="#00A500"> A </font><br/>
<font color="#00A500">{</font><br/>
<br/>
<font color="#FF311D"><u>if</u></font>
<font color="#00A500">{</font><br/>
<font color="#00A500">Print Hello;</font><br/>
!
!
!
!

所以......

输出需求为：

Funct A - if
Funct B - else-if

我写了一个python程序：

def searchhtml(data):
    soup = BeautifulSoup(data, 'html.parser')
    for ran in soup.findAll('font', {'color':'#FF311D'}) :
    print ran.text

if __name__=='__main__':
    page = urllib.urlopen('Sample.html').read()
    searchhtml(page)

问题是：我得到的输出为：

if

else-if

但我需要

Function A - if

Function B - else-if

请帮助我获得正确的输出格式。

如何从python中提取HTML文件中的特定行？

0 个答案: