自动执行“ git move”,使历史记录保持不变

时间:2018-12-03 03:25:52

标签: bash git

我正面临将几个存储库合并到一个存储库中的问题,杂项文件到处移动

根据对SO的一些研究,SO, how to merge repositories我得出了以下草图:

user=some_user
new_superproj=new_proj # new repository, will include old repositories 
hosting=bitbucket.org # gitgub etc
r1=repo1 # repo 1 to merge
r2=repo2
...
# clone to the new place. These are throw-away (!!!) directory
git clone git@${hosting}:${some_user}/${r1}.git
git clone git@${hosting}:${some_user}/${r2}.git
...
mkdir ${new_superproj} && cd ${new_superproj}

# dummy commit so we can merge
git init
dir > deleteme.txt
git add .
git commit -m "Initial dummy commit"
git rm ./deleteme.txt
git commit -m "Clean up initial file"

# repeat for all source repositories
repo=${r1}

pushd .
cd ../${repo}

# In the throw-away repository, move to the subfolder and rewrite log
git filter-branch --index-filter '
    git ls-files -s |
    sed "s,\t,&'"${repo}"'/," |
    GIT_INDEX_FILE=$GIT_INDEX_FILE.new git update-index --index-info &&
    mv $GIT_INDEX_FILE.new $GIT_INDEX_FILE
' HEAD
popd

# now bring data in to the new repository
git remote add -f ${repo} ../${repo}
git merge --allow-unrelated-histories  ${repo}/master -m "Merging repo ${repo} in"
# remove remote to throw-away repo
git remote rm ${repo}

到目前为止,一切都很好,除非我们想在保留日志的同时移动文件。 Git在移动/重命名上很烂,并且日志重写片段的适应性很差,因此重写的方式统一,对于整个目录都是递归的

想法是,当文件移动时,我们知道存储库中没有其他更改,而是重命名和移动了。因此,如何将每个文件的以下部分重写为规范的。取自git filter-branch, official documentation

git filter-branch --index-filter \
    'git ls-files -s | sed "s-\t\"*-&newsubdir/-" |
        GIT_INDEX_FILE=$GIT_INDEX_FILE.new \
            git update-index --index-info &&
     mv "$GIT_INDEX_FILE.new" "$GIT_INDEX_FILE"' HEAD

我很难理解'sed'之后的内容以及它如何应用于git filter-branch

我想运行脚本(bash,python等),所以:

for each file in repository get moved/renamed
    ...
    # in the loop, moved/renamed file found
    old_file="..." # e.g. a/b/c/old_name.txt
    new_file="..." # e.g. a/b/f/g/new_name.txt, at this point it is known, old_file and new_file is the same file
    update_log_paths(old_file, new_file) # <--- this part is needed

有什么想法吗?

1 个答案:

答案 0 :(得分:1)

事实证明,从以下命令Move file-by-file in git进行提示,它就像(伪代码)一样简单:

move_files
cd repo_root
git add . # so changes detected as moves, vs added/deleted
repo_moves=collect_moves_data()
git reset HEAD && git checkout . && git clean -df . # undo all moves

我发现最大的误解是“ git log --follow”或其他“更强”的选项不适用于许多相关的SO问题:

git log --follow <file>

在移动之前不显示日志,而在未更改的情况下提交文件。

for each_move in repo_moves
    old_file, new_file=deduct_old_new_name(each_move)

    new_dir=${new_file%/*}
    filter="$filter                            \n\
      if [ -e \"${old_file}\" ]; then               \n\
          echo                                      \n\
          if [ ! -e \"${new_dir}\" ]; then          \n\
            mkdir --parents \"${new_dir}\" && echo  \n\
          fi                                        \n\
          mv \"${old_file}\" \"${new_file}\"        \n\
        fi                                          \n\
      "

git filter-branch -f --index-filter "`echo -e $filter`"

如果您需要回来:

git pull # with merge
git reset --hard <hash> # get hash of your origin/master, orignin/HEAD), which will be HEAD~2, but I'd check it manually and copy/paste hash