从子进程'stdout的tar中读取单个文件

时间:2012-12-02 09:01:28

标签: python

如何在不敲击磁盘的情况下从命令'stdout中读取单个文件的内容?

我想出了类似的东西:

def get_files_from(sha, files):
    from subprocess import Popen, PIPE
    import tarfile
    p = Popen(["git", "archive", sha], bufsize=10240, stdin=PIPE, stdout=PIPE, stderr=PIPE)
    tar = tarfile.open(fileobj=p.stdout, mode='r|')
    p.communicate()
    members = tar.getmembers()
    names = tar.getnames()
    contents = {}
    for fname in files:
        if fname not in names:
            contents[fname] = None
            continue
        else:
            idx = names.index(fname)
            contents[fname] = members[idx].tobuf()
            contents[fname] = tar.extractfile(members[idx]) #<--- HERE

    tar.close()
    return contents

问题是在标有

的行上添加.read()来电
            contents[fname] = tar.extractfile(members[idx]) #<--- HERE

会给出错误:

  

tarfile.StreamError:不允许向后搜索

那么如何获取文件的内容?

2 个答案:

答案 0 :(得分:3)

您错误拼写了mode=参数,而是写了more=

tar = tarfile.open(fileobj=p.stdout, mode='r|')
如果正确指定模式,则不会调用

.tell()。 : - )

然后你必须循环 tarfile对象来提取成员,你不能从tarfile中读取任意文件:

for entry in tar:
    # test if this is a file you want.
    if entry.name in files:
        f = tar.extractfile(entry) 

您无法使用任何.getnames().getmember().getmembers()方法,因为这些方法需要对文件进行全面扫描,将文件指针放在最后并让您无法使用阅读条目数据本身。

答案 1 :(得分:0)

对于任何有兴趣的人:

def get_files_from(sha, files):
    from subprocess import Popen, PIPE
    import tarfile
    p = Popen(["git", "archive", sha], bufsize=10240, stdin=PIPE, stdout=PIPE, stderr=PIPE)
    tar = tarfile.open(fileobj=p.stdout, mode='r|')
    p.communicate()
    contents = {}
    doall = files == '*'
    if not doall:
        files = set(files)
    for entry in tar:
        if (isinstance(files, set) and entry.name in files) or doall:
            tf = tar.extractfile(entry)
            contents[entry.name] = tf.read()
            if not doall:
                files.discard(entry.name)

    if not doall:
        for fname in files:
            contents[fname] = None

    tar.close()
    return contents

print get_files_from("a8c11fcee68881dfb86095aa36290fb304047cf1", ['README.MD', 'foo'])
print get_files_from("a8c11fcee68881dfb86095aa36290fb304047cf1", '*')

欢迎补丁!