使用LibGit2Sharp截断GIT提交历史记录

时间:2013-12-11 21:14:05

标签: git libgit2 libgit2sharp

我计划以非正统的方式使用LibGit2 / LibGit2Sharp和GIT,我要求任何熟悉API的人确认我建议的理论工作。 :)

方案

只有主分支将存在于存储库中。将跟踪和提交包含大型二进制和非二进制文件的大量目录。大多数二进制文件将在提交之间更改。由于磁盘空间限制,磁盘库应该包含不超过10次提交(磁盘现在经常填满)。

API未提供的功能是将从指定的CommitId开始的提交历史记录截断回主分支的初始提交,并删除任何因此而悬空的GIT对象。

我已经使用ReferenceCollection.RewiteHistory方法进行了测试,我可以使用它从提交中删除父项。这创建了一个新的提交历史,从CommitId开始返回HEAD。但是,这仍然会留下所有旧提交以及这些提交所特有的任何引用或blob。我现在的计划是自己清理这些悬挂的GIT物体。有没有人看到这种方法有任何问题或有更好的方法?

2 个答案:

答案 0 :(得分:3)

  

但是仍然会留下所有旧提交以及这些提交所特有的任何引用或blob。我现在的计划是自己清理这些悬挂的GIT物品。

在重写存储库的历史记录时,LibGit2Sharp负责不丢弃重写的引用。默认情况下,存储它们的命名空间为refs/original。这可以通过RewriteHistoryOptions参数进行更改。

为了删除旧的提交,树和blob,首先必须删除这些引用。这可以通过以下代码实现:

foreach (var reference in repo.Refs.FromGlob("refs/original/*"))
{
    repo.Refs.Remove(reference);
}

下一步将清除现在悬空的git对象。但是,这不能通过LibGit2Sharp(尚未)完成。一种选择是shell来输出以下命令

git gc --aggressive

这将以非常有效/破坏性/不可恢复的方式减少存储库的大小。

  

有没有人看到这种方法有任何问题或有更好的方法?

您的方法看起来有效。

更新

  

有没有人看到这种方法有任何问题或有更好的方法?

如果限制是磁盘大小,另一种选择是使用 git-annex git-bin 等工具来存储大型二进制文件在git存储库之外。请参阅此 SO question 以获取有关主题和潜在缺点(部署,锁定等)的一些不同观点。

  

我将尝试您提供的RewriteHistoryOptions和foreach代码。但是,现在它看起来像File.Delete对我来说悬挂git对象。

要注意,这可能是一条坎坷的道路

  • Git以两种格式存储对象。松散(每个对象的磁盘上有一个文件)或打包(包含许多对象的磁盘上的一个条目)。从包文件中删除对象往往有点复杂,因为它需要重写包文件。
  • 在Windows上,.git\objects文件夹中的条目通常是只读文件。 File.Delete无法在此状态下删除它们。例如,您必须先调用File.SetAttributes(path, FileAttributes.Normal);来取消设置只读属性。
  • 虽然您可能能够确定哪些提交已被重写,但确定悬空/无法访问TreeBlob的内容可能会变成一项非常复杂的任务。

答案 1 :(得分:0)

根据上面的建议,我提出的初步(静态测试)C#代码将截断特定SHA的主分支,从而创建新的初始提交。它还删除了所有悬空引用和Blob

        public class RepositoryUtility
{
    public RepositoryUtility()
    {
    }
    public String[] GetPaths(Commit commit)
    {
        List<String> paths = new List<string>();
        RecursivelyGetPaths(paths, commit.Tree);
        return paths.ToArray();
    }
    private void RecursivelyGetPaths(List<String> paths, Tree tree)
    {
        foreach (TreeEntry te in tree)
        {
            paths.Add(te.Path);
            if (te.TargetType == TreeEntryTargetType.Tree)
            {
                RecursivelyGetPaths(paths, te.Target as Tree);
            }
        }
    }
    public void TruncateCommits(String repositoryPath, Int32 maximumCommitCount)
    {
        IRepository repository = new Repository(repositoryPath);
        Int32 count = 0;
        string newInitialCommitSHA = null;
        foreach (Commit masterCommit in repository.Head.Commits)
        {
            count++;
            if (count == maximumCommitCount)
            {
                newInitialCommitSHA = masterCommit.Sha;
            }
        }
        //there must be parent commits to the commit we want to set as the new initial commit
        if (count > maximumCommitCount)
        {
            TruncateCommits(repository, repositoryPath, newInitialCommitSHA);
        }
    }
    private void RecursivelyCheckTreeItems(Tree tree,Dictionary<String, TreeEntry> treeItems, Dictionary<String, GitObject> gitObjectDeleteList)
    {
        foreach (TreeEntry treeEntry in tree)
        {
            //if the blob does not exist in a commit before the truncation commit then add it to the deletion list
            if (!treeItems.ContainsKey(treeEntry.Target.Sha))
            {
                if (!gitObjectDeleteList.ContainsKey(treeEntry.Target.Sha))
                {
                    gitObjectDeleteList.Add(treeEntry.Target.Sha, treeEntry.Target);
                }
            }
            if (treeEntry.TargetType == TreeEntryTargetType.Tree)
            {
                RecursivelyCheckTreeItems(treeEntry.Target as Tree, treeItems, gitObjectDeleteList);
            }
        }
    }
    private void RecursivelyAddTreeItems(Dictionary<String, TreeEntry> treeItems, Tree tree)
    {
        foreach (TreeEntry treeEntry in tree)
        {
            //check for existance because if a file is renamed it can exist under a tree multiple times with the same SHA
            if (!treeItems.ContainsKey(treeEntry.Target.Sha))
            {
                treeItems.Add(treeEntry.Target.Sha, treeEntry);
            }
            if (treeEntry.TargetType == TreeEntryTargetType.Tree)
            {
                RecursivelyAddTreeItems(treeItems, treeEntry.Target as Tree);
            }
        }
    }
    private void TruncateCommits(IRepository repository, String repositoryPath, string newInitialCommitSHA)
    {
        //get a repository object
        Dictionary<String, TreeEntry> treeItems = new Dictionary<string, TreeEntry>();
        Commit selectedCommit = null;
        Dictionary<String, GitObject> gitObjectDeleteList = new Dictionary<String, GitObject>();
        //loop thru the commits starting at the head moving towards the initial commit  
        foreach (Commit masterCommit in repository.Head.Commits)
        {
            //if non null then we have already found the commit where we want the truncation to occur
            if (selectedCommit != null)
            {
                //since this is a commit after the truncation point add it to our deletion list
                gitObjectDeleteList.Add(masterCommit.Sha, masterCommit);
                //check the blobs of this commit to see if they should be deleted
                RecursivelyCheckTreeItems(masterCommit.Tree, treeItems, gitObjectDeleteList);
            }
            else
            {
                //have we found the commit that we want to be the initial commit
                if (String.Equals(masterCommit.Sha, newInitialCommitSHA, StringComparison.CurrentCultureIgnoreCase))
                {
                    selectedCommit = masterCommit;
                }
                //this commit is before the new initial commit so record the tree entries that need to be kept.
                RecursivelyAddTreeItems(treeItems, masterCommit.Tree);                    
            }
        }

        //this function simply clears out the parents of the new initial commit
        Func<Commit, IEnumerable<Commit>> rewriter = (c) => { return new Commit[0]; };
        //perform the rewrite
        repository.Refs.RewriteHistory(new RewriteHistoryOptions() { CommitParentsRewriter = rewriter }, selectedCommit);

        //clean up references now in origional and remove the commits that they point to
        foreach (var reference in repository.Refs.FromGlob("refs/original/*"))
        {
            repository.Refs.Remove(reference);
            //skip branch reference on file deletion
            if (reference.CanonicalName.IndexOf("master", 0, StringComparison.CurrentCultureIgnoreCase) == -1)
            {
                //delete the Blob from the file system
                DeleteGitBlob(repositoryPath, reference.TargetIdentifier);
            }
        }
        //now remove any tags that reference commits that are going to be deleted in the next step
        foreach (var reference in repository.Refs.FromGlob("refs/tags/*"))
        {
            if (gitObjectDeleteList.ContainsKey(reference.TargetIdentifier))
            {
                repository.Refs.Remove(reference);
            }
        }
        //remove the commits from the GIT ObectDatabase
        foreach (KeyValuePair<String, GitObject> kvp in gitObjectDeleteList)
        {
            //delete the Blob from the file system
            DeleteGitBlob(repositoryPath, kvp.Value.Sha);
        }
    }

    private void DeleteGitBlob(String repositoryPath, String blobSHA)
    {
        String shaDirName = System.IO.Path.Combine(System.IO.Path.Combine(repositoryPath, ".git\\objects"), blobSHA.Substring(0, 2));
        String shaFileName = System.IO.Path.Combine(shaDirName, blobSHA.Substring(2));
        //if the directory exists
        if (System.IO.Directory.Exists(shaDirName))
        {
            //get the files in the directory
            String[] directoryFiles = System.IO.Directory.GetFiles(shaDirName);
            foreach (String directoryFile in directoryFiles)
            {
                //if we found the file to delete
                if (String.Equals(shaFileName, directoryFile, StringComparison.CurrentCultureIgnoreCase))
                {
                    //if readonly set the file to RW
                    FileInfo fi = new FileInfo(shaFileName);
                    if (fi.IsReadOnly)
                    {
                        fi.IsReadOnly = false;
                    }
                    //delete the file
                    File.Delete(shaFileName);
                    //eliminate the directory if only one file existed 
                    if (directoryFiles.Length == 1)
                    {
                        System.IO.Directory.Delete(shaDirName);
                    }
                }
            }
        }
    }
}

感谢您的所有帮助。真诚地感谢。 请注意我编辑了原始代码,因为它没有考虑目录。