通过API查找GitHub存储库中最早的提交

时间:2014-08-04 05:16:06

标签: github github-api

确定何时在GitHub存储库中进行初始提交的最有效方法是什么?存储库具有created_at属性,但对于包含导入历史记录的存储库,最早的提交可能会显着较旧。

使用命令行时,这样的事情会起作用:

git rev-list --max-parents=0 HEAD

但是,我没有在GitHub API中看到等效内容。

5 个答案:

答案 0 :(得分:4)

使用GraphQL API,有一种变通方法来获取特定分支中最早的提交(初始提交)。

首先获取last commit并返回totalCountendCursor

{
  repository(name: "linux", owner: "torvalds") {
    ref(qualifiedName: "master") {
      target {
        ... on Commit {
          history(first: 1) {
            nodes {
              message
              committedDate
              authoredDate
              oid
              author {
                email
                name
              }
            }
            totalCount
            pageInfo {
              endCursor
            }
          }
        }
      }
    }
  }
}

它为光标和pageInfo对象返回类似的内容:

"totalCount": 931886,
"pageInfo": {
  "endCursor": "b961f8dc8976c091180839f4483d67b7c2ca2578 0"
}

我没有有关游标字符串格式b961f8dc8976c091180839f4483d67b7c2ca2578 0的任何资料,但是我已经在其他一些存储库中进行了超过1000次提交的测试,看来它总是像这样格式化:

<static hash> <incremented_number>

因此,您只需从totalCount中减去2(如果totalCount> 1)并得到最旧的提交(或您愿意的话可以是初始提交):

{
  repository(name: "linux", owner: "torvalds") {
    ref(qualifiedName: "master") {
      target {
        ... on Commit {
          history(first: 1, after: "b961f8dc8976c091180839f4483d67b7c2ca2578 931884") {
            nodes {
              message
              committedDate
              authoredDate
              oid
              author {
                email
                name
              }
            }
            totalCount
            pageInfo {
              endCursor
            }
          }
        }
      }
    }
  }
}

给出以下输出(Linus Torvalds的初始提交):

{
  "data": {
    "repository": {
      "ref": {
        "target": {
          "history": {
            "nodes": [
              {
                "message": "Linux-2.6.12-rc2\n\nInitial git repository build. I'm not bothering with the full history,\neven though we have it. We can create a separate \"historical\" git\narchive of that later if we want to, and in the meantime it's about\n3.2GB when imported into git - space that would just make the early\ngit days unnecessarily complicated, when we don't have a lot of good\ninfrastructure for it.\n\nLet it rip!",
                "committedDate": "2005-04-16T22:20:36Z",
                "authoredDate": "2005-04-16T22:20:36Z",
                "oid": "1da177e4c3f41524e886b7f1b8a0c1fc7321cac2",
                "author": {
                  "email": "torvalds@ppc970.osdl.org",
                  "name": "Linus Torvalds"
                }
              }
            ],
            "totalCount": 931886,
            "pageInfo": {
              "endCursor": "b961f8dc8976c091180839f4483d67b7c2ca2578 931885"
            }
          }
        }
      }
    }
  }
}

中的一个简单实现,可以使用此方法进行第一次提交:

import requests

token = "YOUR_TOKEN"

name = "linux"
owner = "torvalds"
branch = "master"

query = """
query ($name: String!, $owner: String!, $branch: String!){
  repository(name: $name, owner: $owner) {
    ref(qualifiedName: $branch) {
      target {
        ... on Commit {
          history(first: 1, after: %s) {
            nodes {
              message
              committedDate
              authoredDate
              oid
              author {
                email
                name
              }
            }
            totalCount
            pageInfo {
              endCursor
            }
          }
        }
      }
    }
  }
}
"""

def getHistory(cursor):
    r = requests.post("https://api.github.com/graphql",
        headers = {
            "Authorization": f"Bearer {token}"
        },
        json = {
            "query": query % cursor,
            "variables": {
                "name": name,
                "owner": owner,
                "branch": branch
            }
        })
    return r.json()["data"]["repository"]["ref"]["target"]["history"]

#in the first request, cursor is null
history = getHistory("null")
totalCount = history["totalCount"]
if (totalCount > 1):
    cursor = history["pageInfo"]["endCursor"].split(" ")
    cursor[1] = str(totalCount - 2)
    history = getHistory(f"\"{' '.join(cursor)}\"")
    print(history["nodes"][0])
else:
    print("got oldest commit (initial commit)")
    print(history["nodes"][0])

您可以在上的this post中找到一个示例

答案 1 :(得分:2)

如果数据已经缓存(在GitHub的一侧)并且取决于您的精度要求,这可以在两个请求中完成。

首先检查创建时间之前是否确实提交了GET /repos/:owner/:repo/commits until并将per_page参数设置为创建时间(根据VonC&#建议) 39; s回答)并将返回的数字限制为1次提交(通过/repos/:owner/:repo/stats/contributors参数)。

如果在创建时间之前有提交,则可以调用contributors statistics endpointweeks)。每个贡献者的响应都有一个w列表,最早的until值与最早的提交相同。

如果您需要精确的时间戳,则可以再次使用提交列表端点,并将since202设置为最早周值后的7天。

请注意,统计端点可能会返回{{1}},表示统计信息不可用,在这种情况下需要在几秒钟内重试。

答案 2 :(得分:1)

一个建议是在回购列表上列出提交(参见GitHub api V3 section),使用until参数设置为创建回购(例如,加上一天)。

GET /repos/:owner/:repo/commits

这样,您将列出创建repo时或之前创建的所有提交:这将限制列表,不包括 all 创建repo后创建的提交。

答案 3 :(得分:-1)

页码的试错,

https://github.com/fatfreecrm/fat_free_crm/commits/master?page=126

git历史记录,例如,可能使用gitk,可以帮助您的试验和错误更有效。

答案 4 :(得分:-1)

这不是通过API,而是在GitHub.com上:如果您具有最新的提交SHA和提交计数,则可以构建URL来找到它:

https://github.com/USER/REPO/commits?after=LAST_COMMIT_SHA+COMMIT_COUNT_MINUS_2

# Example. Commit count in this case was 1573
https://github.com/sindresorhus/refined-github/commits/master
  ?after=a76ed868a84cd0078d8423999faaba7380b0df1b+1571