Question

我开发了一个网络抓取工具来提取维基链接中的所有源代码。该程序在写完一些文件后终止。

 def fetch_code(link_list):
    for href in link_list:
        response = urllib2.urlopen("https://www.wikipedia.org/"+href)
        content = response.read()
        page = open("%s.html" % href, 'w')
        page.write(content.replace("[\/:?*<>|]", " "))
        page.close()

link_list是一个数组，它具有从种子页面提取的链接。

执行后得到的错误是

IOError: [Errno 2] No such file or directory: u'M/s.html'

Answer 1

您无法在其名称中创建带有“/”的文件。

您可以将文件名转义为M％2Fs.html

/是％2F

在python2中，您可以简单地使用urllib来转义文件名，例如：

import urllib

filePath = urllib.quote_plus('M/s.html')

print(filePath)

另一方面，您也可以将http响应保存到层次结构，例如，M / s.html表示名为“M”的目录下的s.html文件。

使用python时无法打印带有特殊字符的文件

1 个答案: