smuge过滤器中`git describe`的评估时间

时间:2018-04-05 07:54:53

标签: git git-filter

使用svn2git成功将旧SVN存储库转换为Git后,我的任务是重现$Revision$关键字扩展(或其近似值)。

所以我...

  • 为SVN rev0

  • 添加了svn-r注释标记 {li}

    添加了.git/attributes

    * filter=revsion
    
    {li}

    添加了.git/configure

    [filter "revsion"]
        smudge = /bin/sed -e 's/\\$Revision\\$/$Revision: '$(GIT_EXEC_PATH=/usr/lib/git-core/ /usr/bin/git describe --match svn-r)'$/g'
        clean = /bin/sed -e 's/\\$Revision: [^$]*\\$/$Revision$/g'
    

......它有效,但正在做错误的事情。

每当我结帐时,它会展开$Revision$之前git describe的{​​{1}}(结帐前)。因此,当我在 master~1 并执行HEAD时。我得到 master~1 的扩展而不是 master 的扩展。

为了确保早期评估不是git checkout master$(...)的错误,我还尝试将此代码移到自己的脚本中,但无济于事。

因此我的问题:有没有办法让涂抹过滤器运行.git/config来描述结帐后的提交?< /强>

1 个答案:

答案 0 :(得分:1)

TL; DR:(测试的)解决方案

尝试this post-checkout hook(现在已经过测试,虽然很轻微;我也把它放在GitHub上的脚本存储库中):

#! /usr/bin/env python

"""
post-checkout hook to re-smudge files
"""

from __future__ import print_function

import collections
import os
import subprocess
import sys

def run(cmd):
    """
    Run command and collect its stdout.  If it produces any stderr
    or exits nonzero, die a la subprocess.check_call().
    """
    proc = subprocess.Popen(cmd, stdout=subprocess.PIPE)
    stdout, _ = proc.communicate()
    status = proc.wait()
    if status != 0:
        raise subprocess.CalledProcessError(status, cmd)
    return stdout

def git_ls_files(*args):
    """
    Run git ls-files with given arguments, plus -z; break up
    returned byte string into list of files.  Note, in Py3k this
    will be a list of byte-strings!
    """
    output = run(['git', 'ls-files', '-z'] + list(args))
    # -z produces NUL termination, not NUL separation: discard last entry
    return output.split(b'\0')[:-1]

def recheckout(files):
    """
    Force Git to re-extract the given files from the index.
    Since Git insists on doing nothing when the files exist
    in the work-tree, we first *remove* them.

    To avoid blowing up on very long argument lists, do these
    1000 files at a time or up to 10k bytes of argument at a
    time, whichever occurs first.  Note that we may go over
    the 10k limit by the length of whatever file is long, so
    it's a sloppy limit and we don't need to be very accurate.
    """
    files = collections.deque(files)
    while files:
        todo = [b'git', b'checkout', b'--']
        # should add 1 to account for b'\0' between arguments in os exec:
        # argbytes = reduce(lambda x, y: x + len(y) + 1, todo, 0)
        # but let's just be very sloppy here
        argbytes = 0
        while files and len(todo) < 1000 and argbytes < 10000:
            path = files.popleft()
            todo.append(path)
            argbytes += len(path) + 1
            os.remove(path)
        # files is now empty, or todo has reached its limit:
        # run the git checkout command
        run(todo)

def warn_about(files):
    """
    Make a note to the user that some file(s) have not been
    re-checked-out as they are modified in the work-tree.
    """
    if len(files) == 0:
        return
    print("Note: the following files have been carried over and may")
    print("not match what you would expect for a clean checkout:")
    # If this is py3k, each path is a bytes and we need a string.
    if type(b'') == type(''):
        printable = lambda path: path
    else:
        printable = lambda path: path.decode('unicode_escape')
    for path in files:
        print('\t{}\n'.format(printable(path)))

def main():
    """
    Run, as called by git post-checkout hook.  We get three arguments
    that are very simple, so no need for argparse.

    We only want to do something when:
     - the flag argument, arg 3, is 1
     - the two other arguments differ

    What we do is re-checkout the *unmodified* files, to
    force them to re-run through any defined .gitattributes
    filter.
    """
    argv = sys.argv[1:]
    if len(argv) != 3:
        return 'error: hook must be called with three arguments'
    if argv[2] != '1':
        return 0
    if argv[0] == argv[1]:
        return 0
    allfiles = git_ls_files()
    modfiles = git_ls_files('-m')
    unmodified = set(allfiles) - set(modfiles)
    recheckout(unmodified)
    warn_about(modfiles)
    return 0

if __name__ == '__main__':
    try:
        sys.exit(main())
    except KeyboardInterrupt:
        sys.exit('\nInterrupted')

为了提高性能,您可以修改它以仅对可能使用$Revision$的文件进行操作(您的属性将其定义为&#34;所有文件&#34;所以我在此处使用它。)

今天早上我想到了这个问题。正如您所观察到的,只是git checkout在更改提交时填充索引和工作树时尚未更新HEAD引用。最终,尝试计算git checkout 即将设置HEAD to 的内容似乎太烦人了。您可以使用post-checkout hook

目前还不清楚是否应该使用代替涂抹过滤器,或者添加涂抹过滤器,但我认为< em>除了是正确的。您几乎肯定还希望清洁过滤器像往常一样运行。

无论如何,结账后的钩子会得到:

  

...三个参数:前一个HEAD的ref,新HEAD的ref(可能已经或可能没有改变),以及一个表示结账是否为分支结账的标志(更改分支,标志= 1) )或文件签出(从索引中检索文件,flag = 0)。此挂钩不会影响 git checkout 的结果。

git checkout和/或此处的文档中有错误。最后一句话说&#34;不能影响结果&#34;但是这在两个方面都不正确:

  • 挂钩的退出状态变为git checkout的退出状态。如果挂钩的退出状态为非零,则这使得结帐似乎失败。
  • 挂钩可以覆盖工作树文件。

这是我打算在这里使用的最后一个。)

  

它也在 git clone 之后运行,除非使用了--no-checkout(-n)选项。赋给钩子的第一个参数是null-ref,第二个是新HEAD的ref,而标志总是1.同样对于 git worktree add ,除非使用了--no-checkout。< / p>      

此挂钩可用于执行存储库有效性检查,如果不同则自动显示与先前HEAD的差异,或设置工作目录元数据属性。

您的目标是在HEAD更新时运行涂抹过滤器。查看source code for builtin/checkout.c,我们发现对于&#34;更改提交&#34; case,git checkout首先填充索引和工作树,然后更新HEAD ref(第一个突出显示的行),然后运行带有两个哈希ID的post-checkout挂钩(第一个将在某些情况下是特殊的空哈希)并且标志设置为1。

根据定义,文件签出不会更改提交,在标志设置为0的情况下运行挂钩。两个散列ID将始终匹配,这就是为什么标记测试几乎肯定是不必要的。

执行文件检出将重新运行涂抹过滤器。由于HEAD现已更新,$Revision$将扩展您想要的方式。关于这一点的明显坏处是每个工作树文件必须更新两次!还有另一个问题,上面的Python代码通过删除所谓的未修改文件来解决,迫使git checkout将它们从索引重新提取到工作树。