Question

我需要构建一个包含页面路径的网页类，并且有一些内置函数，如 str repr 等。这个类后来用于构建一个“搜索引擎”，用于比较页面并返回搜索的最佳马赫。 “pages”以我保存在计算机中的html文件的形式出现。

这就是我现在所拥有的：

def remove_html_tags(s):
    tag = False
    quote = False
    out = ""

    for c in s:
            if c == '<' and not quote:
                tag = True
            elif c == '>' and not quote:
                tag = False
            elif (c == '"' or c == "'") and tag:
                quote = not quote
            elif not tag:
                out = out + c

    return out


class WebPage:
    def __init__(self, filename):

        self.filename = filename

    def process(self):

        f = open(self.filename,'r')
        LINE_lst_1 = f.readlines()
        n = len(LINE_lst_1)

        LINE_lst = LINE_lst_1[1:n-1]

        STRUCTURE = {}

        for i in range(len(LINE_lst)):
            LINE_lst[i] = LINE_lst[i].strip(' \n\t')
            LINE_lst[i] = remove_html_tags(LINE_lst[i])
        for k in range(n-1):
            for line in LINE_lst:
                if len(line) == 0:
                    LINE_lst.remove(line)
        STRUCTURE['body_lines'] = LINE_lst[1:]
        STRUCTURE['title'] = LINE_lst[0]        
        global STRUCTURE

    def __str__(self):
        return STRUCTURE['title']+'\n' +' '.join(STRUCTURE['body_lines'])
    def __repr__(self):
        return STRUCTURE['title']

一切都基本正常，但我想做的一切都没有创建一个全长字典，dosnt长期保存信息。我想以一种我不需要process字典的方式更改方法STRUCTURE。

任何想法？

Answer 1

改为使用self.STRUCTURE。

def process(self):
    #...
    self.STRUCTURE = {}
    #...
    self.STRUCTURE['body_lines'] = LINE_lst[1:]
    self.STRUCTURE['title'] = LINE_lst[0]        

def __str__(self):
    return self.STRUCTURE['title']+'\n' +' '.join(self.STRUCTURE['body_lines'])
def __repr__(self):
    return self.STRUCTURE['title']

...虽然你可能想考虑选择一个新的变量名。

如何定义一个网页类python 2.7没有导入

1 个答案: