Question

我正在尝试使用GitHub v3 API来获取两个SHA之间的完整提交列表，使用the comparison API（/repos/:owner/:repo/compare/:base...:head），但它只返回前250个提交，我需要得到所有这些。

我找到了the API pagination docs，但比较API似乎不支持page或per_page参数，无论是计数还是SHA（编辑：last_sha参数也不起作用。与提交API不同，比较API似乎不会返回Link HTTP标头。

有没有办法增加比较API的提交计数限制或获取第二页提交？

Answer 1

尝试使用参数sha，例如：

https://api.github.com/repos/junit-team/junit/commits?sha=XXX，其中XXX是当前轮次查询中最后一次返回的提交的SHA。然后迭代此过程，直到到达结束SHA。

示例python代码：

startSHA = ''
endSHA = ''
while True:
    url = 'https://api.github.com/repos/junit-team/junit/commits?sha=' + startSHA
    r = requests.get(url)
    data = json.loads(r.text)
    for i in range(len(data)):
        commit = data[i]['sha']
        if commit == endSHA:
            #reach the ending SHA, stop here
        startSHA = commit

Answer 2

这相对容易。这是一个例子：

import requests
next_url = 'https://api.github.com/repos/pydanny/django-admin2/commits'
while next_url:
    response = requests.get(next_url)
    # DO something with response
    # ...
    # ...
    if 'next' in response.links:
        next_url = response.links['next']['url']
    else:
        next_url = ''

更新：

请记住，下一个网址与初始网址不同：初始网址：

https://api.github.com/repos/pydanny/django-admin2/commits

下一个网址：

https://api.github.com/repositories/10054295/commits?top=develop&last_sha=eb204104bd40d2eaaf983a5a556e38dc9134f74e

所以它是全新的网址结构。

Answer 3

我再次尝试解决这个问题。我的笔记：

比较（或拉取请求提交）列表仅显示250个条目。对于pull请求，您可以分页，但无论您做什么，最多只能获得250次提交。
提交列表API可遍历整个提交链，并一直分页到存储库的开头。
对于拉取请求，＆＃34; base＆＃34;提交不一定在历史记录中可以从拉取请求中获得＆＃34; head＆＃34;承诺。对于比较，这是相同的，＆＃34; base_commit＆＃34;不一定是当前领导人历史的一部分。
＆＃34; merge_base_commit＆＃34;然而，它是历史的一部分，所以正确的方法是从头部开始＃34;提交并迭代提交列表查询，直到到达＆＃34; merge_base_commit＆＃34;。对于拉取请求，这意味着必须在＆＃34; head＆＃34;之间进行比较。和＆＃34; base＆＃34;分开拉。
替代方法是使用＆＃34; total_commits＆＃34;通过比较返回，然后向后迭代直到达到所需的提交次数。这似乎有效，但我并不是100％肯定这在合并等所有角落情况下都是正确的。

因此，提交列表API，分页和＆＃34; merge_base_commit＆＃34;解决了这个难题。

Answer 4

尝试使用last_sha参数。提交API似乎用于分页而不是page

Answer 5

这是一个使用Octokit.NET（https://github.com/octokit/octokit.net）获取Pull请求的所有提交的示例

request

如果找到了带有基本sha的提交，我最初尝试将moreToGet设置为true，但是从未包含在提交列表中（不确定原因）所以我只是假设有更多内容，如果比较命中了250的限制。

Answer 6

/commits?per_page=* will give you all commits

Answer 7

这是我使用Octokit.Net的解决方案

private async Task<IReadOnlyList<GitHubCommit>> GetCommits(string branch, string baseBranch)
{
    // compare branches and get all commits returned
    var result = await this.gitHub.Repository.Commit.Compare(this.repoSettings.Owner, this.repoSettings.Name, baseBranch, branch);
    var commits = result.Commits.ToList();

    // the commits property on the result only has the first 250 commits
    if (result.TotalCommits > 250)
    {
        var baseCommitId = result.MergeBaseCommit.Sha;
        var lastCommitLoadedId = commits.First().Sha;
        var allCommitsLoaded = false;
        var page = 1;

        while (!allCommitsLoaded)
        {
            var missingCommits = await this.gitHub.Repository.Commit.GetAll(this.repoSettings.Owner, this.repoSettings.Name, new CommitRequest
            {
                Sha = lastCommitLoadedId // start from the oldest commit returned by compare
            },
            new ApiOptions
            {
                PageCount = 1,
                PageSize = 100, // arbitrary page size - not sure what the limit is here so set it to a reasonably large number
                StartPage = page
            });

            foreach (var missingCommit in missingCommits)
            {
                if (missingCommit.Sha == lastCommitLoadedId)
                {
                    // this is the oldest commit in the compare result so we already have it
                    continue; 
                }

                if (missingCommit.Sha == baseCommitId)
                {
                    // we don't want to include this commit - its the most recent one on the base branch
                    // we've found all the commits now we can break out of both loops
                    allCommitsLoaded = true;
                    break;
                }

                commits.Add(missingCommit);
            }

            page++;
        }
    }

    return commits;
}

Answer 8

我对此有一个解决方案，但这并不美味。这相当于您自己构建图形。一般的策略是递归地要求在BASE和BRANCH之间有更多的比较对象，直到找到正确数量的提交为止。没有优化，这对于大型比较来说是站不住脚的。通过优化，我发现在比较中每50次唯一提交大约需要进行一次比较调用。

import Github
repo = Github(MY_PAT).get_repo(MY_REPO)

def compare(base_commit, branch_commit):
  comparison = repo.compare(base_commit, branch_commit)
  result = set()
  unexplored_commits = set()
  for commit in comparison.commits:
    result.add(commit.sha)
    unexplored_commits.add(commit.sha)
    for parent in commit.parents:
      # It's possible that we'll need to explore a commit's parents directly. E.g., if it's
      # a merge of a large (> 250 commits) recent branch with an older branch.
      unexplored_commits.add(parent.sha)
  while len(commits) < comparison.total_commits:
    commit_to_explore = unexplored_commits.pop()
    commits.update(compare(base_commit, commit_to_explore))
  return commits

如果您确实想实现此目的，我发现有用的优化都是围绕选择要探索的提交。例如：

选择提交以进行随机探索，而不是使用.pop()。这样可以避免出现更糟的情况。我之所以把它放在第一位，主要是因为这样做很简单。
跟踪已提交其祖先完整列表的提交，因此您知道不要不必要地探索这些提交。这是“自己构建图形”部分。
如果在该范围内找到base_commit的祖先，则将其用作平分点。

Answer 9

来自：https://developer.github.com/v3/repos/commits/#working-with-large-comparisons

使用大比较

响应将包括最多250次提交的比较。如果您正在使用更大的提交范围，则可以使用提交列表API枚举范围内的所有提交。

为了与极大的差异进行比较，您可能会收到一个错误响应，表明差异太长而无法生成。您通常可以使用较小的提交范围来解决此错误

GitHub v3 API：获取完整的提交列表以进行大型比较

9 个答案: