Question

我已经熟悉了这个概念，最值得注意的是观看了Raymond Hettinger的优秀video并阅读了接受的答案here，我想知道我的错误。

class ReadHTML(object):

    def __init__(self, url):
        page = urlopen(url).read()
        self.page = page

    @classmethod
    def from_file(cls, path):
        page = open(path).read()
        return cls(page)

这有效

r = ReadHTML('http://example.com')
print r.page

这不是

r = ReadHTML.from_file('example.html')
print r.page

它给我一个错误，好像我试图“urlopen”一个文件：

File "/usr/lib/python2.7/urllib2.py", line 258, in get_type
    raise ValueError, "unknown url type: %s" % self.__original
ValueError: unknown url type: <!doctype html>

你能看出什么是错的吗？

Answer 1

当您致电ReadHTML.__init__()时，您仍在调用班级初始值设定项cls(page);该调用与调用ReadHTML(page)没有什么不同，您只是使用不同的引用。此方法仅接受url参数，代码将其传递给urlopen()，无论如何。

调整ReadHTML.__init__()方法以处理传递的网页而不是网址：

class ReadHTML(object):
    def __init__(self, url=None, page=None):
        if url is not None:
            page = urlopen(url).read()
        self.page = page

    @classmethod
    def from_file(cls, path):
        page = open(path).read()
        return cls(page=page)

现在代码支持两个生成实例的路径。

Answer 2

from_file会打开该页面，但您的__init__()构造函数也是如此，所以如果您执行ReadHTML.from_file('example.html')，那么您实际上是在做：

page = urlopen(open('example.html').read()).read()

就个人而言，我更喜欢Martijn's solution，因为语义清晰，但这里有另一种选择：

class ReadHTML(object):
    def __init__(self, url, opener=urlopen):
        self.page = opener(url).read()

    @classmethod
    def from_file(cls, path):
        return cls(path, opener=open)

此解决方案是有利的，因为它使您能够定义任意开启者（例如，用于打开存储在数据库中的文件）。

Answer 3

我不是可选参数的忠实粉丝。我会这样做，以便默认构造函数接受一个字符串，我将不得不分开备用构造函数来处理文件名和URL。

我还修改了文件名构造函数以显式关闭文件。

class ReadHTML(object):

    def __init__(self, page):
        self.page = page

    @classmethod
    def from_filename(cls, path):
        with open(path) as f:
            page = f.read()
        return cls(page)

    @classmethod
    def from_url(cls, url):
        page = urlopen(url).read()
        return cls(page)

作为旁注，我相信urllib / urllib2支持file：//，所以你绝对不需要文件名构造函数（但我仍然认为它很好）。

创建Python类方法

3 个答案: