下载驻留在树中的远程gz文件,就像目录一样

时间:2017-09-14 02:02:10

标签: python ftp

我已经摸不着头脑了两天多,但仍然无法弄清楚如何做到以下几点! 我想下载ftp://ftp.ncbi.nlm.nih.gov中的所有Geo数据集,然后在每个数据集中,我需要查看它们是否包含我感兴趣的关键字。我能够手动下载其中一个数据集和检查文件中是否有所需的关键字。但是,由于数据集的数量巨大,我无法手动完成。我想写一个程序来为我做。第一步,我只是试着看看我是否可以下载它们。 结构如下:

 hots-> 
  /geo/ 
     -> datasets/ 
       ->  GDS1nnn/ .... all the way through GDS6nnn and each of them 
           contain more than 600 directories; ordered by number i.e. 
            GDS1001. Now, in each of these directories:
           --->  soft  inside this folder there are 2 files that are named 
            like this: folder name (GDS1001)+_full.soft.gz

这是我认为我需要下载的文件,然后查看我要查找的关键字是否在该文件中。

这是我的代码:

ftp = FTP('ftp.ncbi.nlm.nih.gov') # remember that you ONLY need to provide the host name not the complete address!
ftp.login()
#ftp.retrlines('LIST')
ftp.cwd("/geo/datasets/GDS1nnn/")
ftp.retrlines('LIST')
filenames = ftp.nlst() 
count = len(filenames)
curr = 0
print ("found {} files".format(count))
for filename in filenames:
    first_path=filename+"/soft/"
    second_path=first_path+filename+"_full.soft.gz"
    #print(second_path)  
    local_filename = os.path.join(r'full path to a folder that I 
         created')
    file = open(local_filename, 'wb')
    ftp.retrbinary('RETR ' + second_path, file.write)
    file.close()
ftp.quit()

输出:

file = open(local_filename, 'wb')
PermissionError: [Errno 13] Permission denied: full path to a folder that I created'

但是,我对此文件夹有读写权限。 谢谢你的帮助

1 个答案:

答案 0 :(得分:0)

以下代码显示了如何为每个数据集创建文件夹并将其内容保存到该文件夹​​中。

 import sys, ftplib, os, itertools
    from ftplib import FTP
    from zipfile import ZipFile
    ftp = FTP('ftp.ncbi.nlm.nih.gov')
    ftp.login()
    #ftp.retrlines('LIST')
    ftp.cwd("/geo/datasets/GDS1nnn/")
    ftp.retrlines('LIST') 
    filenames = ftp.nlst()
    curr = 0
    #print ("found {} files".format(count))
    count = 0
    for filename in filenames:
        array_db=[]   
        os.mkdir( os.path.join('folder called "output' + filename ) )
        first_path=filename+"/soft/"
        os.mkdir( os.path.join('folder called "output' + first_path ) )
        second_path=first_path+filename+"_full.soft.gz"
        array_db.append(second_path)    
        for array in array_db:
            print(array)
            local_filename = os.path.join('folder called "output' + array )
            file = open(local_filename, 'wb')
            ftp.retrbinary('RETR ' + array, file.write)
            file.flush()
            file.close()    
    ftp.quit()