Question

我正在Why is my git repository so big?上的大型git存储库中从https://github.com/python/cpython运行此命令 git rev-list --all --objects | sed -n $(git rev-list --objects --all | cut -f1 -d' ' | git cat-file --batch-check | grep blob | sort -n -k 3 | tail -n800 | while read hash type size; do size_in_kibibytes=$(echo $size | awk '{ foo = $1 / 1024 ; print foo "KiB" }'); echo -n "-e s/$hash/$size_in_kibibytes/p "; done) | sort -n -k1;

如果我将tail -n800替换为tail -n40，效果很好：

1160.94KiB Lib/ensurepip/_bundled/pip-8.0.2-py2.py3-none-any.whl
1169.59KiB Lib/ensurepip/_bundled/pip-8.1.1-py2.py3-none-any.whl
1170.86KiB Lib/ensurepip/_bundled/pip-8.1.2-py2.py3-none-any.whl
1225.24KiB Lib/ensurepip/_bundled/pip-9.0.0-py2.py3-none-any.whl
...

我发现了这个问题Bash : sed -n arguments，说我可以使用awk代替sed。

您知道当sed: Argument list too long为tail而不是-n800时如何解决此-n40吗？

Answer 1

或者，检查git sizer是否可以在您的存储库中使用：这将有助于隔离发生在存储库中的内容。

如果没有，则在“ How to find/identify large commits in git history?”中还有其他命令，这些命令会在每个对象周围循环并避免使用sed -nxx部分

另一种选择是将结果/命令重定向到一个文件，然后在该文件as in here上保存。

Answer 2

似乎您在链接的问题Some scripts I use:...中使用了此答案。该答案中有一个有说服力的评论：

此功能很棒，但速度却难以想象。如果取消40行限制，它甚至无法在我的计算机上完成。仅供参考，我刚刚添加了一个答案，其中包含此功能的更有效版本。如果要在大型存储库上使用此逻辑，或者要查看每个文件或每个文件夹的总大小，请检查一下。 – piojo '17 Jul 28'7:59

幸运的是piojo有written another answer 解决了这个问题。只需使用他的代码即可。

sed：运行sed -n时参数列表过长

2 个答案: