Question

我正在尝试解析一个类名为class =“link”的html，我的问题是想要读取变量中的每一行然后解析它，但它应该使用三引号，我怎样才能创建一个字符串具有三重引用风格的变量。感谢。

from HTMLParser import HTMLParser

# create a subclass and override the handler methods
class MyHTMLParser(HTMLParser):
    def handle_starttag(self, tag, attrs):
        print "Encountered a start tag:", tag
    def handle_endtag(self, tag):
        print "Encountered an end tag :", tag
    def handle_data(self, data):
        print "Encountered some data  :", data

# instantiate the parser and fed it some HTML
parser = MyHTMLParser()

var = open('./index.html','r')
strings = var.read()



parser.feed('<html><head><title>Test</title></head>'
        '<body><h1>Parse me!</h1></body></html>')

好吧，如果我从本地文件中读取内容，我该如何解析字符串var？

的index.html：

    <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">
<head>
    <meta http-equiv="Content-Type" content="text/html;charset=UTF-8">
    <title>Document</title>
</head>
<body>
    <div class="row">
        <h1>hello world</h1>
            <div class="row">
                <p>Lorem ipsum dolor sit amet, consectetur adipisicing elit. Id, excepturi, consequatur sed nobis facere veritatis tempore qui ipsum enim dignissimos!</p>
            </div>
    </div>
</body>
</html>

如果我将这个html作为一个大字符串阅读，我该如何解析它，我只想获取h1标签中的内容。谢谢你的时间。

Answer 1

   h1 = false

   class MyHTMLParser(HTMLParser):
       def handle_starttag(self, tag, attrs):
          ## print "Encountered a start tag:", tag
          if tag == 'h1':
                 h1 = true
       def handle_endtag(self, tag):
          ## print "Encountered an end tag :", tag
          if tag == 'h1':
                 h1 = false
       def handle_data(self, data):
           ## print "Encountered some data  :", data
           if h1:
                 print data

使python变量三重引用样式

1 个答案: