假设您拥有存储库:
myCode/megaProject/moduleA
myCode/megaProject/moduleB
随着时间的推移(月),您重新组织项目。重构代码以使模块独立。 megaProject目录中的文件将移动到它们自己的目录中。强调移动 - 保留这些文件的历史记录。
myCode/megaProject
myCode/moduleA
myCode/moduleB
现在您希望将这些模块移动到他们自己的GIT回购中。只用megaProject离开原版。
myCode/megaProject
newRepoA/moduleA
newRepoB/moduleB
filter-branch
命令被记录为执行此操作,但当文件移出目标目录时,它不会跟踪历史记录。因此,历史记录从文件移动到新目录开始,而不是文件的历史,然后它们存在于旧的megaProject目录中。
如何根据目标目录拆分GIT历史记录,并遵循此路径之外的历史记录 - 只保留与这些文件相关的提交历史记录,而不是其他内容?
关于SO的许多其他答案主要集中在拆分回购 - 但没有提及拆分和跟随移动历史。
答案 0 :(得分:3)
这是基于@rksawyer脚本的版本,但使用git-filter-repo代替。我发现它比git-filter-branch更加容易使用并且速度更快。
heroku run ./manage.py my-script --app app-here
它将# This script should run in the same folder as the project folder is.
# This script uses git-filter-repo (https://github.com/newren/git-filter-repo).
# The list of files and folders that you want to keep should be named <your_repo_folder_name>_KEEP.txt. I should contain a line end in the last line, otherwise the last file/folder will be skipped.
# The result will be the folder called <your_repo_folder_name>_REWRITE_CLONE. Your original repo won't be changed.
# Tags are not preserved, see line below to preserve tags.
# Running subsequent times will backup the last run in <your_repo_folder_name>_REWRITE_CLONE_BKP.
# Define here the name of the folder containing the repo:
GIT_REPO="git-test-orig"
clone="$GIT_REPO"_REWRITE_CLONE
temp=/tmp/git_rewrite_temp
rm -Rf "$clone"_BKP
mv "$clone" "$clone"_BKP
rm -Rf "$temp"
mkdir "$temp"
git clone "$GIT_REPO" "$clone"
cd "$clone"
git remote remove origin
open .
open "$temp"
# Comment line below to preserve tags
git tag | xargs git tag -d
echo 'Start logging file history...'
echo "# git log results:\n" > "$temp"/log.txt
while read p
do
shopt -s dotglob
find "$p" -type f > "$temp"/temp
while read f
do
echo "## " "$f" >> "$temp"/log.txt
# print every file and follow to get any previous renames
# Then remove blank lines. Then remove every other line to end up with the list of filenames
git log --pretty=format:'%H' --name-only --follow -- "$f" | awk 'NF > 0' | awk 'NR%2==0' | tee -a "$temp"/log.txt
echo "\n\n" >> "$temp"/log.txt
done < "$temp"/temp
done < ../"$GIT_REPO"_KEEP.txt > "$temp"/PRESERVE
mv "$temp"/PRESERVE "$temp"/PRESERVE_full
awk '!a[$0]++' "$temp"/PRESERVE_full > "$temp"/PRESERVE
sort -o "$temp"/PRESERVE "$temp"/PRESERVE
echo 'Starting filter-branch --------------------------'
git filter-repo --paths-from-file "$temp"/PRESERVE --force --replace-refs delete-no-add
echo 'Finished filter-branch --------------------------'
的结果记录到git log
中的文件中,因此,如果不需要log.txt并希望它运行得更快,则可以删除这些行。
答案 1 :(得分:2)
在克隆的存储库中运行git filter-branch --subdirectory-filter
将删除所有不会影响该子目录中内容的提交,其中包括在移动文件之前影响文件的内容。
相反,您需要使用带有脚本的--index-filter
标志来删除您不感兴趣的所有文件,并使用--prune-empty
标志来忽略任何影响其他内容的提交。
blog post from Kevin Deldycke有一个很好的例子:
git filter-branch --prune-empty --tree-filter 'find ./ -maxdepth 1 -not -path "./e107*" -and -not -path "./wordpress-e107*" -and -not -path "./.git" -and -not -path "./" -print -exec rm -rf "{}" \;' -- --all
此命令有效地依次检出每个提交,从工作目录中删除所有不感兴趣的文件,如果从上次提交中发生了任何变化,则将其检入(随后重写历史记录)。您需要调整该命令以删除除/moduleA
,/megaProject/moduleA
以外的所有文件以及您要保留/megaProject
的特定文件。
答案 2 :(得分:2)
我知道没有简单的方法可以做到这一点,但可以做到。
filter-branch
的问题在于它的工作原理是
在每个修订版上应用自定义过滤器
如果您可以创建一个不会删除文件的过滤器,则会在目录之间跟踪它们。当然,对于任何非平凡的存储库来说,这可能都是非常重要的。
开始:让我们假设它是一个简单的存储库。您从未重命名过文件,并且您从未在两个具有相同名称的模块中拥有文件。您需要做的就是获取模块find megaProject/moduleA -type f -printf "%f\n" > preserve
中的文件列表,然后使用这些文件名和目录运行过滤器:
<强> preserve.sh 强>
cmd="find . -type f ! -name d1"
while read f; do
cmd="$cmd ! -name $f"
done < /path/to/myCode/preserve
for i in $($cmd)
do
rm $i
done
git filter-branch --prune-empty --tree-filter '/path/to/myCode/preserve.sh' HEAD
当然,重命名会让这很难。 git filter-branch
所做的一件好事是为您提供$GIT_COMMIT
环境变量。然后,您可以使用以下内容:
for f in megaProject/moduleA
do
git log --pretty=format:'%H' --name-only --follow -- $f | awk '{ if($0 != ""){ printf $0 ":"; next; } print; }'
done > preserve
使用提交来构建文件名历史记录,可以用来代替简单示例中的简单preserve
文件,但是有责任跟踪你应该出现的文件在每次提交。这实际上不应该太难以编码,但我还没有看到任何人已经完成它。
答案 3 :(得分:1)
按照上面的答案。首先使用git log --follow遍历目录中的所有文件,从先前的移动/重命名中混合旧路径/名称。然后使用filter-branch遍历每个修订版本,删除步骤1中创建的列表中未包含的所有文件。
#!/bin/bash
DIRNAME=dirD
# Catch all files including hidden files
shopt -s dotglob
for f in $DIRNAME/*
do
# print every file and follow to get any previous renames
# Then remove blank lines. Then remove every other line to end up with the list of filenames
git log --pretty=format:'%H' --name-only --follow -- $f | awk 'NF > 0' | awk 'NR%2==0'
done > /tmp/PRESERVE
sort -o /tmp/PRESERVE /tmp/PRESERVE
cat /tmp/PRESERVE
然后创建一个脚本(preserve.sh),每个分支都会调用filter-branch。
#!/bin/bash
DIRNAME=dirD
# Delete everything that's not in the PRESERVE list
echo 'delete this files:'
cmd=`find . -type f -not -path './.git/*' -not -path './$DIRNAME/*'`
echo $cmd > /tmp/ALL
# Convert to one filename per line and remove the lead ./
cat /tmp/ALL | awk '{NF++;while(NF-->1)print $NF}' | cut -c3- > /tmp/ALL2
sort -o /tmp/ALL2 /tmp/ALL2
#echo 'before:'
#cat /tmp/ALL2
comm -23 /tmp/ALL2 /tmp/PRESERVE > /tmp/DELETE_THESE
echo 'delete these:'
cat /tmp/DELETE_THESE
#exit 0
while read f; do
rm $f
done < /tmp/DELETE_THESE
现在使用filter-branch,如果在修订版中删除了所有文件,则修剪该提交及其消息。
git filter-branch --prune-empty --tree-filter '/FULL_PATH/preserve.sh' master
答案 4 :(得分:0)
我们把自己描绘成一个更糟糕的角落,在几十个分支机构中有数十个项目,每个项目依赖于1-4个其他项目,共计56k次提交。 filter-branch最多需要24小时来关闭一个目录。
我最终使用libgit2sharp和原始文件系统访问在.NET中编写了一个工具,以便为每个项目拆分任意数量的目录,并且只保留新存储库中每个项目的相关提交/分支/标记。它不是修改源代码,而是仅使用已配置的路径/ refs写出N个其他repos。
欢迎您查看这是否符合您的需求,进行修改等。https://github.com/CurseStaff/GitSplit
答案 5 :(得分:0)
这是我@Roberto 发布的脚本版本,是为 linux/wsl 编写的。如果您不指定“myrepo_KEEP.txt”,它将根据当前文件结构创建一个。传入 repo 以进行处理:
<块引用>prune.sh MyRepo
# This script should run one level up from the git repo folder (i.e. the containing folder)
# This script uses git-filter-repo (github.com/newren/git-filter-repo).
# The result will be the folder called <your_repo_folder_name>_REWRITE_CLONE. Your original repo won't be changed.
# Tags are not preserved, see line below to preserve tags.
# Running subsequent times will backup the last run in <your_repo_folder_name>_REWRITE_CLONE_BKP.
# Optionally, list the files and folders that you want to keep the KEEP_FILE (<your_repo_folder_name>_KEEP.txt)
## It should contain a line end in the last line, otherwise the last file/folder will be skipped.
## If this file is missing it will be created by this script with all current folders listed.
echo "Prune git repo"
# User needs to pass in the repo name
GIT_REPO=$1
if [ -z $GIT_REPO ]; then
echo "Pass in the directory to prune"
else
KEEP_FILE="${GIT_REPO}"_KEEP.txt
# Build up a list of current directories in the repo, if one hasn't been supplied
if [ ! -f "${KEEP_FILE}" ]; then
echo "Keeping all current files in repo (generating keep file)"
cd $GIT_REPO
find . -type d -not -path '*/\.*' > "../${KEEP_FILE}"
cd ..
fi
echo "Pruning $GIT_REPO"
clone="${GIT_REPO}_REWRITE_CLONE"
# Shift backup
bkp="${clone}_BKP"
temp=/tmp/git_rewrite_temp
echo $clone
rm -Rf "$bkp"
mv "$clone" "$bkp"
# Setup temp
rm -Rf "$temp"
mkdir "$temp"
# Clone
echo "Cloning repo...from $GIT_REPO to $clone"
if git clone "$GIT_REPO" "$clone"; then
cd "$clone"
git remote remove origin
# Comment line below to preserve tags
git tag | xargs git tag -d
echo 'Start logging file history...'
echo "# git log results:\n" > "$temp"/log.txt
# Follow the renames
while read p
do
shopt -s dotglob
find "$p" -type f > "$temp"/temp
while read f
do
echo "## " "$f" >> "$temp"/log.txt
# print every file and follow to get any previous renames
# Then remove blank lines. Then remove every other line to end up with the list of filenames
git log --pretty=format:'%H' --name-only --follow -- "$f" | awk 'NF > 0' | awk 'NR%2==0' | tee -a "$temp"/log.txt
echo "\n\n" >> "$temp"/log.txt
done < "$temp"/temp
done < ../"${KEEP_FILE}" > "$temp"/PRESERVE
mv "$temp"/PRESERVE "$temp"/PRESERVE_full
awk '!a[$0]++' "$temp"/PRESERVE_full > "$temp"/PRESERVE
sort -o "$temp"/PRESERVE "$temp"/PRESERVE
echo 'Starting filter-branch --------------------------'
git filter-repo --paths-from-file "$temp"/PRESERVE --force --replace-refs delete-no-add
echo 'Finished filter-branch --------------------------'
cd ..
fi
fi
感谢@rksawyer 和@Roberto。