我已经摸不着头脑了两天多,但仍然无法弄清楚如何做到以下几点! 我想下载ftp://ftp.ncbi.nlm.nih.gov中的所有Geo数据集,然后在每个数据集中,我需要查看它们是否包含我感兴趣的关键字。我能够手动下载其中一个数据集和检查文件中是否有所需的关键字。但是,由于数据集的数量巨大,我无法手动完成。我想写一个程序来为我做。第一步,我只是试着看看我是否可以下载它们。 结构如下:
hots->
/geo/
-> datasets/
-> GDS1nnn/ .... all the way through GDS6nnn and each of them
contain more than 600 directories; ordered by number i.e.
GDS1001. Now, in each of these directories:
---> soft inside this folder there are 2 files that are named
like this: folder name (GDS1001)+_full.soft.gz
这是我认为我需要下载的文件,然后查看我要查找的关键字是否在该文件中。
这是我的代码:
ftp = FTP('ftp.ncbi.nlm.nih.gov') # remember that you ONLY need to provide the host name not the complete address!
ftp.login()
#ftp.retrlines('LIST')
ftp.cwd("/geo/datasets/GDS1nnn/")
ftp.retrlines('LIST')
filenames = ftp.nlst()
count = len(filenames)
curr = 0
print ("found {} files".format(count))
for filename in filenames:
first_path=filename+"/soft/"
second_path=first_path+filename+"_full.soft.gz"
#print(second_path)
local_filename = os.path.join(r'full path to a folder that I
created')
file = open(local_filename, 'wb')
ftp.retrbinary('RETR ' + second_path, file.write)
file.close()
ftp.quit()
输出:
file = open(local_filename, 'wb')
PermissionError: [Errno 13] Permission denied: full path to a folder that I created'
但是,我对此文件夹有读写权限。 谢谢你的帮助
答案 0 :(得分:0)
以下代码显示了如何为每个数据集创建文件夹并将其内容保存到该文件夹中。
import sys, ftplib, os, itertools
from ftplib import FTP
from zipfile import ZipFile
ftp = FTP('ftp.ncbi.nlm.nih.gov')
ftp.login()
#ftp.retrlines('LIST')
ftp.cwd("/geo/datasets/GDS1nnn/")
ftp.retrlines('LIST')
filenames = ftp.nlst()
curr = 0
#print ("found {} files".format(count))
count = 0
for filename in filenames:
array_db=[]
os.mkdir( os.path.join('folder called "output' + filename ) )
first_path=filename+"/soft/"
os.mkdir( os.path.join('folder called "output' + first_path ) )
second_path=first_path+filename+"_full.soft.gz"
array_db.append(second_path)
for array in array_db:
print(array)
local_filename = os.path.join('folder called "output' + array )
file = open(local_filename, 'wb')
ftp.retrbinary('RETR ' + array, file.write)
file.flush()
file.close()
ftp.quit()