使用GitHub API在浅层克隆中获取特定的远程提交(及其深度)?

时间:2018-01-24 14:46:25

标签: git github

假设我需要在修订/提交哈希标识042b84a处获取https://github.com/mozilla/gecko-dev。现在,整个仓库都是(See the size of a github repo before cloning it?):

wget -qO- https://api.github.com/repos/mozilla/gecko-dev | grep size
#  "size": 3891062,  # this in kB

......这对我来说太过分了。所以,我想,我会得到一个浅层克隆 - 仅此一个就可以获得近400 MB:

git clone --depth 1 https://github.com/mozilla/gecko-dev
# remote: Counting objects: 231302, done.
# Receiving objects: 100% (231302/231302), 392.95 MiB

现在,这克隆了HEAD,我不能从这里到达042b84a,尤其是我使用的git版本1.9.1客户端(How to shallow clone a specific commit with depth 1?; How do fetch remote branch after I cloned repo with git clone --depth 1; Git: get a particular revision of a git repository with depth 1) 。显然,我能做的最好的事情就是不要使用repo(无论如何都会像完整克隆一样下载),慢慢增加深度。

我不确定“深度”是否只对应于HEAD和给定修订版之间的提交次数 - Get git sha depth on a remote branch注意到对于完整克隆,您可以这样做:

git rev-list HEAD ^042b84a --count

...所以,这意味着“深度”确实是HEAD和给定修订之间的提交次数 - 但是,没有明显的方法可以从git中的远程仓库查询。

因此,在完成克隆/深度增加之前,找到当前HEAD所需的042b84a的深度会很酷;我想也许使用命令行中的GitHub API可以提供帮助,因为它是从GitHub托管的。所以我试过了:

cd gecko-dev

wget -qO- https://api.github.com/repos/mozilla/gecko-dev/commits/042b84a | grep date
#      "date": "2017-04-27T07:18:07Z"

curl -sI 'https://api.github.com/repos/mozilla/gecko-dev/commits?sha=042b84a' | grep last
# Link: <https://api.github.com/repositories/13509108/commits?sha=042b84a&page=2>; rel="next", <https://api.github.com/repositories/13509108/commits?sha=042b84a&page=17756>; rel="last"

wget -qO- 'https://api.github.com/repos/mozilla/gecko-dev/commits?sha=042b84a&page=17756' | grep '^    "sha"' | wc -l
# 5

由于参数sha为“SHA or branch to start listing commits from”,而GitHub API为“a call to list GitHub's public repositories provides paginated items in sets of 30”,此处我们有17756页,其中17756页有5个结果; - 所以,我们在042b84a和HEAD之间有17755 * 30 + 5 = 532655提交?

所以,然后我这样做 - 但是:

git fetch --progress --depth=532655
# error: RPC failed; result=18, HTTP code = 200
# fatal: The remote end hung up unexpectedly

......通话失败。

是否可以使用git客户端1.9以某种方式扩展这个浅层克隆,以包含修订版042b84a而无需克隆所有4GB数据 - 使用GitHub API提供的一些存储库数据?

编辑:有了这个,但仍然没有明确的答案。首先,532655的深度对于从现在(2018年1月)到2017年4月的提交之间的距离是可疑的。所以,我尝试查找自提交日期以来的提交:

curl -sI 'https://api.github.com/repos/mozilla/gecko-dev/commits?since=2017-04-27T07:18:07Z' | grep last
# Link: <https://api.github.com/repositories/13509108/commits?since=2017-04-27T07%3A18%3A07Z&page=2>; rel="next", <https://api.github.com/repositories/13509108/commits?since=2017-04-27T07%3A18%3A07Z&page=1267>; rel="last"
wget -qO- 'https://api.github.com/repos/mozilla/gecko-dev/commits?since=2017-04-27T07:18:07Z&page=1267' | grep '^    "sha"' | wc -l
# 18
wcalc 1266*30+18
# = 37998
git fetch -v --progress --depth=37998
# POST git-upload-pack (419 bytes)
# error: RPC failed; result=18, HTTP code = 200
# fatal: The remote end hung up unexpectedly

所以,从日期开始,我们得到37998次提交或深度,但即使是那次调用也无法获取。

因此,知道提交数量至少为数千,我尝试慢慢增加:

git fetch -vvvv --progress --depth=1000 origin
# remote: Counting objects: 53595, done.
# remote: Compressing objects: 100% (24434/24434), done.
# remote: Total 53595 (delta 43532), reused 36280 (delta 28120), pack-reused 0
# Receiving objects: 100% (53595/53595), 16.14 MiB | 409.00 KiB/s, done.
# Resolving deltas: 100% (43532/43532), completed with 10563 local objects.
# From https://github.com/mozilla/gecko-dev
#  = [up to date]      master     -> origin/master
git log --oneline | wc -l
# 7492

git fetch -vvvv --progress --depth=2000 origin
# remote: Counting objects: 140804, done.
# remote: Compressing objects: 100% (54300/54300), done.
# Receiving objects: 100% (140804/140804), 57.13 MiB | 404.00 KiB/s, done.
# remote: Total 140804 (delta 114158), reused 106827 (delta 84436), pack-reused 0
# Resolving deltas: 100% (114158/114158), completed with 20700 local objects.
# From https://github.com/mozilla/gecko-dev
#  = [up to date]      master     -> origin/master
git log --oneline | wc -l
# 18137

......最后是循环:

i=2000; until git show 042b84a; do i=$((i+1000)); echo "depth $i"; git fetch --depth=$i ; done
# fatal: ambiguous argument '042b84a': unknown revision or path not in the working tree.
# Use '--' to separate paths from revisions, like this:
# 'git <command> [<revision>...] -- [<file>...]'
# depth 3000
# remote: Counting objects: 136434, done.
# remote: Compressing objects: 100% (47014/47014), done.
# remote: Total 136434 (delta 108858), reused 110481 (delta 86139), pack-reused 0
# Receiving objects: 100% (136434/136434), 71.36 MiB | 403.00 KiB/s, done.
# Resolving deltas: 100% (108858/108858), completed with 13997 local objects.
# fatal: ambiguous argument '042b84a': unknown revision or path not in the working tree.
# Use '--' to separate paths from revisions, like this:
# 'git <command> [<revision>...] -- [<file>...]'
# depth 4000
# remote: Counting objects: 240103, done.
# remote: Compressing objects: 100% (77811/77811), done.
# remote: Total 240103 (delta 196215), reused 195977 (delta 157920), pack-reused 0
# Receiving objects: 100% (240103/240103), 117.71 MiB | 404.00 KiB/s, done.
# Resolving deltas: 100% (196215/196215), completed with 23725 local objects.
# commit 042b84af6020b1f2d8029a0dc36ac5955b7f325f [...]
git log --oneline | wc -l
# 50871
git rev-list HEAD ^042b84a --count
# 45283

(根据对象的数量,下载大小等的增加来判断,在这种情况下看起来已经取出--depth=1000并不重要 - 在发出fetch --depth=2000时,所有先前的对象都将重新下载?)

所以,当我们042b84a时,提交git fetch --depth 4000终于出现了 - 所以显然这个提交的深度是3000&lt;深度&lt; = 4000?,在那个深度,我们可以计算50871个日志条目(提交?),而git rev-list HEAD ^042b84a --count报告45283(也提交?)?那么“深度”是什么,如果没有提交的数量呢?

0 个答案:

没有答案