Question

我有这段代码，它遍历URL的txt文件并搜索要下载的文件：

URLS = open("urlfile.txt").readlines()

def downloader():
    with open('data.csv', 'w') as csvfile:
        writer = csv.writer(csvfile)
        for url in downloadtools.URLS:
            try:
                html_data = urlopen(url)
            except:
                print 'Error opening URL: ' + url
                pass

            #Creates a BS object out of the open URL.
            soup = bs(html_data)
            #Parsing the URL for later use
            urlinfo = urlparse.urlparse(url)
            domain = urlparse.urlunparse((urlinfo.scheme, urlinfo.netloc, '', '', '', ''))
            path = urlinfo.path.rsplit('/', 1)[0]

            FILETYPE = ['\.pdf$', '\.ppt$', '\.pptx$', '\.doc$', '\.docx$', '\.xls$', '\.xlsx$', '\.wmv$', '\.mp4$', '\.mp3$']

            #Loop iterates through list of file types for open URL.
            for types in FILETYPE:
                for link in soup.findAll(href = compile(types)):
                    urlfile = link.get('href')
                    filename = urlfile.split('/')[-1]
                    while os.path.exists(filename):
                        try:
                            fileprefix = filename.split('_')[0]
                            filetype = filename.split('.')[-1]
                            num = int(filename.split('.')[0].split('_')[1])
                            filename = fileprefix + '_' + str(num + 1) + '.' + filetype
                        except:
                            filetype = filename.split('.')[1]
                            fileprefix = filename.split('.')[0] + '_' + str(1)
                            filename = fileprefix + '.' + filetype

                    #Creates a full URL if needed.
                    if '://' not in urlfile and not urlfile.startswith('//'):
                        if not urlfile.startswith('/'):
                            urlfile = urlparse.urljoin(path, urlfile)
                        urlfile = urlparse.urljoin(domain, urlfile)

                    #Downloads the urlfile or returns error for manual inspection
                    try:
                        urlretrieve(urlfile, filename, Percentage)
                        writer.writerow(['SUCCESS', url, urlfile, filename])
                        print "     SUCCESS"
                    except:
                        print "     ERROR"
                        writer.writerow(['ERROR', url, urlfile, filename])

除了未将数据写入CSV之外，一切正常。没有目录被更改（我知道，至少......）

脚本遍历外部URL列表，查找文件，正确下载文件，然后打印成功＃34;成功＆＃34;或＆＃34;错误＆＃34;没有问题。它唯一没做的就是将数据写入CSV文件。它将完整地运行而无需编写任何CSV数据。

我尝试在virtualenv中运行它，以确保没有任何奇怪的包问题。

我的嵌入式循环是否会导致CSV数据无法写入？

Answer 1

请尝试with open('data.csv', 'wb') as csvfile:。

http://docs.python.org/2/tutorial/inputoutput.html#reading-and-writing-files

或者，构建一个可迭代代替writerow，然后使用writerows。如果以交互模式运行脚本，则可以查看可迭代行的内容。（即[[＆＃39; SUCCESS＆＃39;，...]，[＆＃39; SUCCESS＆＃39;，...]，...]）

import csv
with open('some.csv', 'wb') as f:
    writer = csv.writer(f)
    writer.writerows(someiterable)

Answer 2

因此，我让脚本完整运行，并且出于某种原因，数据在运行一段时间后开始写入CSV。我不知道如何解释。数据以某种方式存储在内存中并随机开始写入？我不知道，但与终端中打印的日志相比，数据准确无误。

怪异。

Python CSV无法写入

2 个答案: