在FTP服务器

时间:2015-07-16 21:55:36

标签: python algorithm ftp os.walk

如何让os.walk遍历FTP数据库(位于远程服务器上)的目录树?现在构建代码的方式是(提供注释):

import fnmatch, os, ftplib

def find(pattern, startdir=os.curdir): #find function taking variables for both desired file and the starting directory
    for (thisDir, subsHere, filesHere) in os.walk(startdir): #each of the variables change as the directory tree is walked
        for name in subsHere + filesHere: #going through all of the files and subdirectories
            if fnmatch.fnmatch(name, pattern): #if the name of one of the files or subs is the same as the inputted name
                fullpath = os.path.join(thisDir, name) #fullpath equals the concatenation of the directory and the name
                yield fullpath #return fullpath but anew each time

def findlist(pattern, startdir = os.curdir, dosort=False):
    matches = list(find(pattern, startdir)) #find with arguments pattern and startdir put into a list data structure
    if dosort: matches.sort() #isn't dosort automatically False? Is this statement any different from the same thing but with a line in between
    return matches

#def ftp(
#specifying where to search.

if __name__ == '__main__':
    import sys
    namepattern, startdir = sys.argv[1], sys.argv[2]
    for name in find(namepattern, startdir): print (name)

我在想我需要定义一个新函数(即def ftp())来将此功能添加到上面的代码中。但是,我担心os.walk函数默认只会遍历运行代码的计算机的目录树。

有没有办法可以扩展os.walk的功能,以便能够遍历远程目录树(通过FTP)?

4 个答案:

答案 0 :(得分:2)

您只需要使用python的ftplib模块。由于os.walk()基于广度优先搜索算法,因此您需要在每次迭代时查找目录和文件名,然后从第一个目录继续遍历遍历。大约两年前我实现了this algorithm用作FTPwalker的核心,这是通过FTP遍历极大目录树的最佳包。

from os import path as ospath


class FTPWalk:
    """
    This class is contain corresponding functions for traversing the FTP
    servers using BFS algorithm.
    """
    def __init__(self, connection):
        self.connection = connection

    def listdir(self, _path):
        """
        return files and directory names within a path (directory)
        """

        file_list, dirs, nondirs = [], [], []
        try:
            self.connection.cwd(_path)
        except Exception as exp:
            print ("the current path is : ", self.connection.pwd(), exp.__str__(),_path)
            return [], []
        else:
            self.connection.retrlines('LIST', lambda x: file_list.append(x.split()))
            for info in file_list:
                ls_type, name = info[0], info[-1]
                if ls_type.startswith('d'):
                    dirs.append(name)
                else:
                    nondirs.append(name)
            return dirs, nondirs

    def walk(self, path='/'):
        """
        Walk through FTP server's directory tree, based on a BFS algorithm.
        """
        dirs, nondirs = self.listdir(path)
        yield path, dirs, nondirs
        for name in dirs:
            path = ospath.join(path, name)
            yield from self.walk(path)
            # In python2 use:
            # for path, dirs, nondirs in self.walk(path):
            #     yield path, dirs, nondirs
            self.connection.cwd('..')
            path = ospath.dirname(path)

现在使用这个类,您只需使用ftplib模块创建一个连接对象,并将该对象传递给FTPWalk对象,然后遍历walk()函数:

In [2]: from test import FTPWalk

In [3]: import ftplib

In [4]: connection = ftplib.FTP("ftp.uniprot.org")

In [5]: connection.login()
Out[5]: '230 Login successful.'

In [6]: ftpwalk = FTPWalk(connection)

In [7]: for i in ftpwalk.walk():
            print(i)
   ...:     
('/', ['pub'], [])
('/pub', ['databases'], ['robots.txt'])
('/pub/databases', ['uniprot'], [])
('/pub/databases/uniprot', ['current_release', 'previous_releases'], ['LICENSE', 'current_release/README', 'current_release/knowledgebase/complete', 'previous_releases/', 'current_release/relnotes.txt', 'current_release/uniref'])
('/pub/databases/uniprot/current_release', ['decoy', 'knowledgebase', 'rdf', 'uniparc', 'uniref'], ['README', 'RELEASE.metalink', 'changes.html', 'news.html', 'relnotes.txt'])
...
...
...

答案 1 :(得分:0)

我会假设这是你想要的......虽然我真的不知道

ssh = paramiko.SSHClient()
ssh.connect(server, username=username, password=password)
ssh_stdin, ssh_stdout, ssh_stderr = ssh.exec_command("locate my_file.txt")
print ssh_stdout

这将要求远程服务器拥有mlocate包`sudo apt-get install mlocate; sudo updatedb();

答案 2 :(得分:0)

我需要在FTP上使用os.walk之类的功能,在那儿没有任何功能,因此我认为编写它会很有用,对于以后的引用,您可以找到最新版本here

顺便说一下,这是执行此操作的代码:

def FTP_Walker(FTPpath,localpath):
    os.chdir(localpath)
    current_loc = os.getcwd()
    for item in ftp.nlst(FTPpath):
        if not is_file(item):
            yield from FTP_Walker(item,current_loc)

        elif is_file(item):
            yield(item)
            current_loc = localpath
        else:
            print('this is a item that i could not process')
    os.chdir(localpath)
    return


def is_file(filename):
    current = ftp.pwd()
    try:
        ftp.cwd(filename)
    except Exception as e :
        ftp.cwd(current)
        return True

    ftp.cwd(current)
    return False

使用方法:

首先连接到您的主机:

host_address = "my host address"
user_name = "my username"
password = "my password"


ftp = FTP(host_address)
ftp.login(user=user_name,passwd=password)

现在您可以像这样调用函数:

ftpwalk = FTP_Walker("FTP root path","path to local") # I'm not using path to local yet but in future versions I will improve it. so you can just path an '/' to it 

然后要打印和下载文件,您可以执行以下操作:

for item in ftpwalk:
ftp.retrbinary("RETR "+item, open(os.path.join(current_loc,item.split('/')[-1]),"wb").write) #it is downloading the file 
print(item) # it will print the file address

(我将很快为其编写更多功能,因此,如果您需要一些特定的东西或有任何对用户有用的想法,我将很高兴听到)

答案 3 :(得分:0)

我写了一个库pip install walk-sftp。事件虽然它被命名为 walk-sftp,但我包含了一个 WalkFTP 类,它允许您按文件的 start_date 和文件的 end_date 进行过滤。您甚至可以传入一个返回 True 或 False 的 processing_function 以查看您清理和存储数据的过程是否有效。它还有一个日志参数(传递文件名),它使用 pickle 并跟踪任何进度,这样您就不会覆盖或必须跟踪日期,从而使回填更容易。

https://pypi.org/project/walk-sftp/