文件在Python中创建/写入问题

时间:2015-03-05 01:01:26

标签: python

我正在尝试创建和写入文件。我有以下代码:

from urllib2 import urlopen

def crawler(seed_url):
    to_crawl = [seed_url]
    crawled=[]
    while to_crawl:
        page = to_crawl.pop()
        page_source = urlopen(page)
        s = page_source.read()
        with open(str(page)+".txt","a+") as f:
            f.write(s)
            f.close()
    return crawled

if __name__ == "__main__":
    crawler('http://www.yelp.com/')

但是,它会返回错误:

Traceback (most recent call last):
  File "/Users/adamg/PycharmProjects/NLP-HW1/scrape-test.py", line 29, in <module>
    crawler('http://www.yelp.com/')
  File "/Users/adamg/PycharmProjects/NLP-HW1/scrape-test.py", line 14, in crawler
    with open("./"+str(page)+".txt","a+") as f:
IOError: [Errno 2] No such file or directory: 'http://www.yelp.com/.txt'

我认为open(file,"a+")应该创造和写作。我做错了什么?

1 个答案:

答案 0 :(得分:5)

如果要使用URL作为目录的基础,则应编码 URL。这样,斜杠(以及其他字符)将转换为不会干扰文件系统/ shell的字符序列。

urllib库可以为此提供帮助。

所以,例如:

>>> import urllib
>>> urllib.quote_plus('http://www.yelp.com/')
'http%3A%2F%2Fwww.yelp.com%2F'