如何提取github存储库的提交页面总数

时间:2019-04-10 09:51:29

标签: bash shell github github-api git-bash

我正在设置一个脚本,用于导出所有提交并提取请求以获取更大的github存储库列表(约4000个)。

脚本的基本概念起作用后,我需要一种方法来遍历存储库的所有提交页面。

我发现我每页可以导出100个提交。对于某些存储库,还有更多的提交(例如8000个),因此我需要循环浏览80页。

我找不到从github api中提取页数的方法。

到目前为止,我所做的是设置脚本,使其遍历所有提交并将其导出到txt / csv文件。

我需要做的是在开始遍历一个repo的提交之前知道页面的总数。

这给了我一些我无法使用的页面数量。

curl -u "user:password" -I https://api.github.com/repos/0chain/rocksdb/commits?per_page=100

结果:

  

链接:https://api.github.com/repositories/152923130/commits?per_page=100&page=2; rel =“ next”,https://api.github.com/repositories/152923130/commits?per_page=100&page=75; rel =“ last”

我需要将值75(或其他存储库中的其他任何值)用作循环中的变量。

像这样:

repolist=`cat repolist.txt`
repolistarray=($(echo $repolist))
repolength=$(echo "${#repolistarray[@]}")

for (( i = 0; i <= $repolength; i++ )); do
    #here i need to extract the pagenumber
    pagenumber=$(curl -u "user:password" -I https://api.github.com/repos/$(echo "${repolistarray[i]}")/commits?per_page=100)

    for (( n = 1; n <= $pagenumber; n++ )); do
        curl -u "user:password" -s https://api.github.com/repos/$(echo "${repolistarray[i]}")/commits?per_page=100&page$(echo "$n") >committest.txt
    done
done

done

如何从中获得“ 75”或其他任何结果

  

链接:https://api.github.com/repositories/152923130/commits?per_page=100&page=2; rel =“ next”,https://api.github.com/repositories/152923130/commits?per_page=100&page=75; rel =“ last”

用作“ n”?

2 个答案:

答案 0 :(得分:1)

以下是@Poshi评论的内容:无限循环请求下一页,直到您碰到空白页,然后跳出内循环,进入下一个回购。

# this is the contents of a page past the last real page:
emptypage='[

]'

# here's a simpler way to iterate over each repo than using a bash array
cat repolist.txt | while read -d' ' repo; do

  # loop indefinitely
  page=0
  while true; do
    page=$((page + 1))

    # minor improvement: use a variable, not a file.
    # also, you don't need to echo variables, just use them
    result=$(curl -u "user:password" -s \ 
      "https://api.github.com/repos/$repo/commits?per_page=100&page=$n")

    # if the result is empty, break out of the inner loop
    [ "$result" = "$emptypage" ] && break

    echo "$result" > committest.txt
    # note that > overwrites (whereas >> appends),
    # so committest.txt will be overwritten with each new page.
    #
    # in the final version, you probably want to process the results here,
    # and then
    #
    #       echo "$processed_results"
    #     done > repo1.txt
    #   done
    #
    # to ouput once per repo, or
    #
    #       echo "$processed_results"
    #     done
    #   done > all_results.txt
    #
    # to output all results to a single file

  done
done

答案 1 :(得分:0)

嗯,您所要求的方法不是最常用的方法,通常是通过获取页​​面直到没有更多数据可用来完成的。但是要回答您的特定问题,我们必须解析包含该信息的行。一种快速而肮脏的方法可能是:

const PartyInfo = require('../PartyInfo.json');

let party = {
        name: this.state.name,
        info: this.state.info,
        date: this.state.date,
        price: this.state.price,
    };
    let data = JSON.stringify(party);
    PartyInfo.writeFile('PartyInfo.json', data);

还有其他一些方法可以执行此操作,使用较少的命令,也许更简单,但这使我可以逐步进行解释。这些其他方式之一可能是:

response="Link: https://api.github.com/repositories/152923130/commits?per_page=100&page=2; rel=\"next\", https://api.github.com/repositories/152923130/commits?per_page=100&page=75; rel=\"last\""

<<< "$response" cut -f2- -d: | # First, get the contents of "Link": everything after the first colon
tr "," $'\n' |      # Separate the different parts in different lines
grep 'rel="last"' | # Select the line with last page information
cut -f1 -d';' |     # Keep only the URL
tr "?&" $'\n' |     # Split URL and its parameters, one per line
grep -e "^page" |   # Select the "page" parameter
cut -f2 -d=         # Finally, extract the number we are interested in

这个假设有一些假设,例如<<< "$response" sed 's/.*&page=\(.*\); rel="last".*/\1/' 始终是最后一个参数。