一次从文件读取一个字符,忽略所有内容以获取“ <”(也忽略“ <”)。
一次读取一个字符,然后将它们附加到字符串中,直到“>”或空白(也忽略“>”)
预期输出应为:[.... html,body,h1,/ h1,/ h2,/ body,.....]
从文档中获取所有标签
<html>
<head>
<title>Title</title>
</head>
<body>
<p><strong><em>Q2. HTML TAG CHECKER</em></strong></p>
<p></p>
<p>A <em>markup language</em> is a language that annotates text so that the
computer can manipulate the text. Most markup languages are human readable
because the annotations are written in a way to distinguish them from the
text. The most important feature of a markup language is that the
<em>tags</em> it uses to indicate annotations should be easy to distinguish
from the document <em>content</em>.</p>
<p>One of the most well-known markup languages is the one commonly used to
create web pages, called <strong>HTML</strong>, or "Hypertext Markup
Language". In HTML, tags appear in "angle brackets" such as in
"<html>". When you load a Web page in your browser, you do not see
the tags themselves: the browser interprets the tags as instructions on how
to format the text for display.</p>
<p>Most tags in HTML are used in pairs to indicate where an effect starts
and ends. For example:</p>
<p><p>
this is a paragraph of text written in HTML
</p></p>
<p>Here <p> represents the start of a paragraph, and </p>
indicates where that paragraph ends.</p>
<p>Other tags include <b> and </b> that are used to place the
enclosed text in <strong>bold</strong> font, and <i> and </i>
indicate that the enclosed text is <em>italic</em>.</p>
<p>Note that "end" tags look just like the "start" tags, except for the
addition of a backslash ‘/’after the < symbol.</p>
<p>Sets of tags are often nested inside other sets of tags. For example, an
<em>ordered list</em> is a list of numbered bullets. You specify the start
of an ordered list with the tag <ol>, and the end with </ol>.
Within the ordered list, you identify items to be numbered with the tags
<li> (for "list item") and </li>. For example, the following
specification:</p>
<p><ol></p>
<p><li>First item</li></p>
<p><li>Second item</li></p>
<p><li>Third item</li></p>
<p></ol></p>
<p>would result in the following:</p>
<ol>
<li>First item</li>
<li>Second item</li>
<li>Third item</li>
</ol>
Stack.py:
class Stack:
def __init__(self):
self.items = []
def is_empty(self):
return self.items == []
def size(self):
return len(self.items)
def push(self, item):
self.items.append(item)
def pop(self):
return self.items.pop()
def peek(self):
return self.items[-1]
#Returns string representation of contents of stack
def __str__(self):
return
main.py
from Stack import Stack
#Processes HTML file and returns list of HTML tag objects
def process_html_file(file_name):
tag_list = []
s =Stack()
with open(file_name, 'r') as f:
all_lines = []
# loop through all lines using f.readlines() method
for line in f.readlines():
new_line = []
# this is how you would loop through each alphabet
for chars in line:
new_line.append(chars)
all_lines.append(new_line)