Question

假设我的设置看起来像

phd/code/
phd/figures/
phd/thesis/

由于历史原因，这些都有自己的git存储库。但我想将它们合并为一个，以简化一些事情。例如，现在我可能会进行两组更改，并且必须执行类似

的操作

cd phd/code
git commit 
cd ../figures
git commit

执行

（现在）很好

cd phd
git commit

似乎有几种方法可以使用子模块或从我的子存储库中提取，但这比我正在寻找的要复杂一些。

至少，我很满意

cd phd
git init
git add [[everything that's already in my other repositories]]

但这似乎不是一个单行。 git中有什么可以帮助我的吗？

Answer 1

以下是我给出的解决方案here：

首先对您的博士目录进行完整备份：我不想为您失去多年的辛勤工作负责！ ; - ）
```
$ cp -r phd phd-backup
```

将phd/code的内容移至phd/code/code，并修复历史记录，使其看起来一直存在（这使用git的filter-branch命令）：

$ cd phd/code
$ git filter-branch --index-filter \
    'git ls-files -s | sed "s#\t#&code/#" |
     GIT_INDEX_FILE=$GIT_INDEX_FILE.new \
     git update-index --index-info &&
     mv $GIT_INDEX_FILE.new $GIT_INDEX_FILE' HEAD

phd/figures和phd/thesis的内容相同（只需将code替换为figures和thesis）。

现在您的目录结构应如下所示：

phd
  |_code
  |    |_.git
  |    |_code
  |         |_(your code...)
  |_figures
  |    |_.git
  |    |_figures
  |         |_(your figures...)
  |_thesis
       |_.git
       |_thesis
            |_(your thesis...)

然后在根目录中创建一个git存储库，将所有内容放入其中并删除旧存储库：

$ cd phd
$ git init

$ git pull code
$ rm -rf code/code
$ rm -rf code/.git

$ git pull figures --allow-unrelated-histories
$ rm -rf figures/figures
$ rm -rf figures/.git

$ git pull thesis --allow-unrelated-histories
$ rm -rf thesis/thesis
$ rm -rf thesis/.git

最后，你现在应该拥有你想要的东西：

phd
  |_.git
  |_code
  |    |_(your code...)
  |_figures
  |    |_(your figures...)
  |_thesis
       |_(your thesis...)

此程序的一个不错的方面是它将保留非版本化文件和目录。

希望这有帮助。

但只有一句警告：如果您的code目录已经有code子目录或文件，那么事情可能会非常错误（figures和thesis课程）。如果是这种情况，只需在完成整个过程之前重命名该目录或文件：

$ cd phd/code
$ git mv code code-repository-migration
$ git commit -m "preparing the code directory for migration"

当程序结束时，添加最后一步：

$ cd phd
$ git mv code/code-repository-migration code/code
$ git commit -m "final step for code directory migration"

当然，如果code子目录或文件未版本化，只需使用mv代替git mv，并忘记git commit s。

Answer 2

git-stitch-repo将在命令行上给出的git存储库上处理git-fast-export --all --date-order的输出，并创建一个适合git-fast-import的流，该流将创建一个包含所有提交的新存储库在一个新的提交树中，它尊重所有源存储库的历史记录。

Answer 3

或许，简单地（与之前的答案类似，但使用更简单的命令）在每个单独的旧存储库中进行提交，将内容移动到适当命名的子目录中，例如：

$ cd phd/code
$ mkdir code
# This won't work literally, because * would also match the new code/ subdir, but you understand what I mean:
$ git mv * code/
$ git commit -m "preparing the code directory for migration"

然后将三个单独的repos合并为一个新的，通过做smth：

$ cd ../..
$ mkdir phd.all
$ cd phd.all
$ git init
$ git pull ../phd/code
...

然后你会保存你的历史，但会继续使用一个回购。

Answer 4

您可以尝试subtree merge strategy。它会让你将repo B合并到repo A中。优于git-filter-branch的优点是它不需要你重写你的repo A的历史（打破SHA1总和）。

Answer 5

@MiniQuark解决方案帮助了我很多，但不幸的是它没有考虑源代码库中的标签（至少在我的情况下）。以下是我对@MiniQuark答案的改进。

首先创建包含组合repo和merged repos的目录，为每个合并的目录创建目录。


$ mkdir new_phd
  $ mkdir new_phd / code
  $ mkdir new_phd /数字
  $ mkdir new_phd / thesis
拉动每个存储库并获取所有标记。（仅为code子目录提供说明）


$ cd new_phd / code
  $ git init
  $ git pull ../../original_phd/code master
  $ git fetch ../../original_phd/code refs / tags / *：refs / tags / *
（这是对MiniQuark答案第2点的改进）将new_phd/code的内容移至new_phd/code/code并在每个标记之前添加code_首选项

$ git filter-branch --index-filter'git ls-files -s | sed“s- \ t \”* - ＆amp; code / - “| GIT_INDEX_FILE = $ GIT_INDEX_FILE.new git update-index --index-info＆amp;＆amp; mv $ GIT_INDEX_FILE.new $ GIT_INDEX_FILE'--tag-name-过滤'sed's - 。* - code_＆amp; - “'HEAD
这样做之后，标签的数量将是进行过滤分支之前的两倍。旧标签保留在repo中，并添加带有code_前缀的新标签。


$ git tag
  MyTag1中
  code_mytag1

手动删除旧标签：


$ ls .git / refs / tags / * | grep -v“/ code_”| xargs rm

对其他子目录重复2,3,4点
现在我们有@MiniQuark anwser point 3中的目录结构。
按照MiniQuark anwser的第4点进行操作，但在完成拉动之后，在删除.git目录之前，请获取标记：

$ git fetch catalog refs / tags / *：refs / tags / *

继续..

这只是另一种解决方案。希望它对某人有所帮助，它帮助了我:)。

Answer 6

来自Aristotle Pagaltzis' answer的git-stitch-repo仅适用于具有简单线性历史记录的存储库。

MiniQuark's answer适用于所有存储库，但它不处理标记和分支。

我创建的程序与MiniQuark描述的方式相同，但它使用一个合并提交（带有N个父项），并且还重新创建所有标记和分支以指向这些合并提交。

有关如何使用它的示例，请参阅git-merge-repos repository。

Answer 7

我创建了一个完成此任务的工具。使用的方法类似（在内部制作像--filter-branch这样的东西）但更友好。是GPL 2.0

http://github.com/geppo12/GitCombineRepo

Answer 8

实际上，git-stitch-repo现在支持分支和标签，包括带注释的标签（我发现有一个我报告的错误，并且它已得到修复）。我觉得有用的是标签。由于标签附加到提交，一些解决方案（如Eric Lee的方法）无法处理标签。您尝试从导入的标记创建分支，它将撤消任何git合并/移动并将您发回，就像整合的存储库与标记来自的存储库几乎相同。此外，如果您在多个存储库中使用相同的标记，并且已合并/合并，则会出现问题。例如，如果您有repo的广告B，则两者都有标记rel_1.0。您将repo A和repo B合并到repo AB中。由于rel_1.0标签位于两个不同的提交中（一个用于A，一个用于B），哪个标签在AB中可见？来自导入的repo A或来自导入的repo B的标记，但不是两者。

git-stitch-repo有助于通过创建rel_1.0-A和rel_1.0-B标记来解决该问题。您可能无法检出rel_1.0标记并期望两者，但至少您可以看到两者，理论上，您可以将它们合并到一个公共的本地分支，然后在该合并的分支上创建一个rel_1.0标记（假设您刚刚合并而不是更改源代码）。使用分支机构更好，因为您可以将每个仓库中的分支机构合并到本地分支机构中。（dev-a和dev-b可以合并到一个本地开发分支，然后可以将其推送到原点）。

Answer 9

您建议的顺序

git init
git add *
git commit -a -m "import everything"

可以使用，但您将丢失提交历史记录。

Answer 10

To merge a secondProject within a mainProject:

A) In the secondProject

git fast-export --all --date-order > /tmp/secondProjectExport

B) In the mainProject:

git checkout -b secondProject
git fast-import --force < /tmp/secondProjectExport

In this branch do all heavy transformation you need to do and commit them.

C) Then back to the master and a classical merge between the two branches:

git checkout master
git merge secondProject

Answer 11

我也会在这里抛出我的解决方案。它基本上是git filter-branch的一个相当简单的bash脚本包装器。与其他解决方案一样，它只迁移主分支，不迁移标记。但是完整的主提交历史记录已经迁移，它是一个简短的bash脚本，因此用户查看或调整它应该相对容易。

superclass's

Answer 12

此bash脚本可解决sed tab character问题（例如在MacOS上）和文件丢失的问题。

export SUBREPO="subrepo"; # <= your subrepository name here
export TABULATOR=`printf '\t'`;
FILTER='git ls-files -s | sed "s#${TABULATOR}#&${SUBREPO}/#" |
  GIT_INDEX_FILE=$GIT_INDEX_FILE.new \
  git update-index --index-info &&
  if [ -f "$GIT_INDEX_FILE.new" ]; then mv $GIT_INDEX_FILE.new $GIT_INDEX_FILE; else echo "git filter skipped missing file: $GIT_INXEX_FILE.new"; fi'

git filter-branch --index-filter "$FILTER" HEAD

这是miniquark，marius-butuc和ryan的帖子的组合。为他们加油！

组合多个git存储库

13 个答案: