有时候,我正在使用的某个软件(例如linkchecker)的原始GitHub存储库几乎没有开发,也没有开发,但是创建了很多fork(在本例中为142,撰写本文时)。
对于每个叉子,我想知道:
,并针对每个此类分支:
GitHub has a web interface for comparing forks,但我不想为每个分叉手动执行此操作,我只想要一个包含所有分叉结果的CSV文件。如何编写脚本? The GitHub API can list the forks,但我看不到如何将叉子与之比较。依次克隆每个fork并在本地进行比较似乎有些粗糙。
答案 0 :(得分:2)
聚会迟到了 - 我想这是我第二次在这个 SO 帖子上结束,所以我将分享我的基于 js 的解决方案(我最终通过获取和搜索 html 页面制作了一个书签) . 您可以从中创建一个 bookmarklet,也可以简单地将整个内容粘贴到控制台中。适用于基于铬的和 Firefox:
编辑:如果页面上有超过 10 个左右的分叉,您可能会因抓取速度过快而被锁定(网络中 429 个请求过多)。改用异步/等待:
javascript:(async () => {
/* while on the forks page, collect all the hrefs and pop off the first one (original repo) */
const forks = [...document.querySelectorAll('div.repo a:last-of-type')].map(x => x.href).slice(1);
for (const fork of forks) {
/* fetch the forked repo as html, search for the "This branch is [n commits ahead,] [m commits behind]", print it to console */
await fetch(fork)
.then(x => x.text())
.then(html => console.log(`${fork}: ${html.match(/This branch is.*/).pop().replace('This branch is ', '')}`))
.catch(console.error);
}
})();
或者你可以批量处理,但很容易被锁定
javascript:(async () => {
/* while on the forks page, collect all the hrefs and pop off the first one (original repo) */
const forks = [...document.querySelectorAll('div.repo a:last-of-type')].map(x => x.href).slice(1);
getfork = (fork) => {
return fetch(fork)
.then(x => x.text())
.then(html => console.log(`${fork}: ${html.match(/This branch is.*/).pop().replace('This branch is ', '')}`))
.catch(console.error);
}
while (forks.length) {
await Promise.all(forks.splice(0, 2).map(getfork));
}
})();
原始(这会立即触发所有请求,如果请求数超过 github 允许的数量,则可能会将您锁定)
javascript:(() => {
/* while on the forks page, collect all the hrefs and pop off the first one (original repo) */
const forks = [...document.querySelectorAll('div.repo a:last-of-type')].map(x => x.href).slice(1);
for (const fork of forks) {
/* fetch the forked repo as html, search for the "This branch is [n commits ahead,] [m commits behind]", print it to console */
fetch(fork)
.then(x => x.text())
.then(html => console.log(`${fork}: ${html.match(/This branch is.*/).pop().replace('This branch is ', '')}`))
.catch(console.error);
}
})();
将打印如下内容:
https://github.com/user1/repo: 289 commits behind original:master.
https://github.com/user2/repo: 489 commits behind original:master.
https://github.com/user2/repo: 1 commit ahead, 501 commits behind original:master.
...
到控制台。
编辑:用块注释替换注释以提高粘贴能力
答案 1 :(得分:1)
以下书签将信息直接打印到网页上,如下所示:
作为书签添加(或粘贴到控制台)的代码:
javascript:(async () => {
/* while on the forks page, collect all the hrefs and pop off the first one (original repo) */
const aTags = [...document.querySelectorAll('div.repo a:last-of-type')].slice(1);
for (const aTag of aTags) {
/* fetch the forked repo as html, search for the "This branch is [n commits ahead,] [m commits behind]", print it directly onto the web page */
await fetch(aTag.href)
.then(x => x.text())
.then(html => aTag.outerHTML += `${html.match(/This branch is.*/).pop().replace('This branch is', '').replace(/([0-9]+ commits? ahead)/, '<font color="#0c0">$1</font>').replace(/([0-9]+ commits? behind)/, '<font color="red">$1</font>')}`)
.catch(console.error);
}
})();
已从 this answer 修改。
以下书签也打印 ZIP 文件的链接:
作为书签添加(或粘贴到控制台)的代码:
javascript:(async () => {
/* while on the forks page, collect all the hrefs and pop off the first one (original repo) */
const aTags = [...document.querySelectorAll('div.repo a:last-of-type')].slice(1);
for (const aTag of aTags) {
/* fetch the forked repo as html, search for the "This branch is [n commits ahead,] [m commits behind]", print it directly onto the web page */
await fetch(aTag.href)
.then(x => x.text())
.then(html => aTag.outerHTML += `${html.match(/This branch is.*/).pop().replace('This branch is', '').replace(/([0-9]+ commits? ahead)/, '<font color="#0c0">$1</font>').replace(/([0-9]+ commits? behind)/, '<font color="red">$1</font>')}` + " <a " + `${html.match(/href="[^"]*\.zip">/).pop() + "Download ZIP</a>"}`)
.catch(console.error);
}
})();
答案 2 :(得分:0)
痒得完全一样,并编写了一个抓取器,将在渲染的HTML中打印的信息用于分叉:https://github.com/hbbio/forkizard
绝对不是完美的,而是一个临时解决方案。
答案 3 :(得分:0)
active-forks并没有完全满足我的要求,但是它非常接近并且易于使用。
答案 4 :(得分:0)
这是一个 Python 脚本,用于列出和克隆前面的所有分支。
它不使用 API。所以它不受速率限制的影响,也不需要身份验证。但如果 GitHub 网站设计发生变化,可能需要进行调整。
与其他答案中显示 ZIP 文件链接的书签不同,此脚本还保存有关提交的信息,因为它使用 git clone
并且还创建了一个带有概述的 commits.htm
文件。
import requests, re, os, sys
def content_from_url(url):
# TODO handle internet being off and stuff
text = requests.get(url).content
return text
def clone_ahead_forks(forklist_url):
forklist_htm = content_from_url(forklist_url)
with open("forklist.htm", "w") as text_file:
text_file.write(forklist_htm)
is_root = True
for match in re.finditer('<a class="(Link--secondary)?" href="(/([^/"]*)/[^/"]*)">', forklist_htm):
fork_url = 'https://github.com'+match.group(2)
fork_owner_login = match.group(3)
fork_htm = content_from_url(fork_url)
match2 = re.search('<div class="d-flex flex-auto">[^<]*?([0-9]+ commits? ahead(, [0-9]+ commits? behind)?)', fork_htm)
# TODO if website design changes, fallback onto checking whether 'ahead'/'behind'/'even with' appear only once on the entire page - in that case they are not part of the username etc.
sys.stdout.write('.')
if match2 or is_root:
if match2:
aheadness = match2.group(1) # for example '1 commit ahead, 2 commits behind'
else:
aheadness = 'root repo'
is_root = False # for subsequent iterations
dir = fork_owner_login+' ('+aheadness+')'
print dir
os.mkdir(dir)
os.chdir(dir)
# save commits.htm
commits_htm = content_from_url(fork_url+'/commits')
with open("commits.htm", "w") as text_file:
text_file.write(commits_htm)
# git clone
os.system('git clone '+fork_url+'.git')
print
# no need to recurse into forks of forks because they are all listed on the initial page and being traversed already
os.chdir('..')
forklist_url = 'https://github.com/cifkao/tonnetz-viz/network/members'
clone_ahead_forks(forklist_url)
这会创建例如以下目录,每个目录都包含各自的克隆存储库:
bakaiadam (2 commits ahead)
chumo (2 commits ahead, 4 commits behind)
cifkao (root repo)
codedot (76 commits ahead, 27 commits behind)
k-hatano (41 commits ahead)
shimafuri (11 commits ahead, 8 commits behind)
答案 5 :(得分:0)
这是一个 Python 脚本,用于列出和克隆前面的分叉。此脚本部分使用了 API,因此它会触发速率限制(您可以通过在脚本中添加 GitHub API authentication 来扩展速率限制(不是无限),请编辑或发布)。
一开始我尝试完全使用 API,但触发速率限制太快,所以现在我使用 is_fork_ahead_HTML
而不是 is_fork_ahead_API
。如果 GitHub 网站设计发生变化,这可能需要进行调整。
由于速率限制,我更喜欢我在此处发布的其他答案。
import requests, json, os, re
def obj_from_json_from_url(url):
# TODO handle internet being off and stuff
text = requests.get(url).content
obj = json.loads(text)
return obj, text
def is_fork_ahead_API(fork, default_branch_of_parent):
""" Use the GitHub API to check whether `fork` is ahead.
This triggers the rate limit, so prefer the non-API version below instead.
"""
# Compare default branch of original repo with default branch of fork.
comparison, comparison_json = obj_from_json_from_url('https://api.github.com/repos/'+user+'/'+repo+'/compare/'+default_branch_of_parent+'...'+fork['owner']['login']+':'+fork['default_branch'])
if comparison['ahead_by']>0:
return comparison_json
else:
return False
def is_fork_ahead_HTML(fork):
""" Use the GitHub website to check whether `fork` is ahead.
"""
htm = requests.get(fork['html_url']).content
match = re.search('<div class="d-flex flex-auto">[^<]*?([0-9]+ commits? ahead(, [0-9]+ commits? behind)?)', htm)
# TODO if website design changes, fallback onto checking whether 'ahead'/'behind'/'even with' appear only once on the entire page - in that case they are not part of the username etc.
if match:
return match.group(1) # for example '1 commit ahead, 114 commits behind'
else:
return False
def clone_ahead_forks(user,repo):
obj, _ = obj_from_json_from_url('https://api.github.com/repos/'+user+'/'+repo)
default_branch_of_parent = obj["default_branch"]
page = 0
forks = None
while forks != [{}]:
page += 1
forks, _ = obj_from_json_from_url('https://api.github.com/repos/'+user+'/'+repo+'/forks?per_page=100&page='+str(page))
for fork in forks:
aheadness = is_fork_ahead_HTML(fork)
if aheadness:
#dir = fork['owner']['login']+' ('+str(comparison['ahead_by'])+' commits ahead, '+str(comparison['behind_by'])+'commits behind)'
dir = fork['owner']['login']+' ('+aheadness+')'
print dir
os.mkdir(dir)
os.chdir(dir)
os.system('git clone '+fork['clone_url'])
print
# recurse into forks of forks
if fork['forks_count']>0:
clone_ahead_forks(fork['owner']['login'], fork['name'])
os.chdir('..')
user = 'cifkao'
repo = 'tonnetz-viz'
clone_ahead_forks(user,repo)