我有一个包含数千个文件的Git存储库,并希望得到每个文件的最后一次提交的日期和时间。这可以使用Python完成(例如,通过使用os.path.getmtime(path)
)?
答案 0 :(得分:3)
一个有趣的问题。下面是一个快速而肮脏的实现。
我已经使用multiprocessing.Pool.imap()
来启动子进程,因为它很方便。
#!/usr/bin/env python
# vim:fileencoding=utf-8:ft=python
#
# Author: R.F. Smith <rsmith@xs4all.nl>
# Last modified: 2015-05-24 12:28:45 +0200
#
# To the extent possible under law, Roland Smith has waived all
# copyright and related or neighboring rights to gitdates.py. This
# work is published from the Netherlands. See
# http://creativecommons.org/publicdomain/zero/1.0/
"""For each file in a directory managed by git, get the short hash and
data of the most recent commit of that file."""
from __future__ import print_function
from multiprocessing import Pool
import os
import subprocess
import sys
import time
# Suppres annoying command prompts on ms-windows.
startupinfo = None
if os.name == 'nt':
startupinfo = subprocess.STARTUPINFO()
startupinfo.dwFlags |= subprocess.STARTF_USESHOWWINDOW
def main():
"""
Entry point for gitdates.
"""
checkfor(['git', '--version'])
# Get a list of all files
allfiles = []
# Get a list of excluded files.
if '.git' not in os.listdir('.'):
print('This directory is not managed by git.')
sys.exit(0)
exargs = ['git', 'ls-files', '-i', '-o', '--exclude-standard']
exc = subprocess.check_output(exargs, startupinfo=startupinfo).split()
for root, dirs, files in os.walk('.'):
for d in ['.git', '__pycache__']:
try:
dirs.remove(d)
except ValueError:
pass
tmp = [os.path.join(root, f) for f in files if f not in exc]
allfiles += tmp
# Gather the files' data using a Pool.
p = Pool()
filedata = [res for res in p.imap_unordered(filecheck, allfiles)
if res is not None]
p.close()
# Sort the data (latest modified first) and print it
filedata.sort(key=lambda a: a[2], reverse=True)
dfmt = '%Y-%m-%d %H:%M:%S %Z'
for name, tag, date in filedata:
print('{}|{}|{}'.format(name, tag, time.strftime(dfmt, date)))
def checkfor(args, rv=0):
"""
Make sure that a program necessary for using this script is available.
Calls sys.exit when this is not the case.
Arguments:
args: String or list of strings of commands. A single string may
not contain spaces.
rv: Expected return value from evoking the command.
"""
if isinstance(args, str):
if ' ' in args:
raise ValueError('no spaces in single command allowed')
args = [args]
try:
with open(os.devnull, 'w') as bb:
rc = subprocess.call(args, stdout=bb, stderr=bb,
startupinfo=startupinfo)
if rc != rv:
raise OSError
except OSError as oops:
outs = "Required program '{}' not found: {}."
print(outs.format(args[0], oops.strerror))
sys.exit(1)
def filecheck(fname):
"""
Start a git process to get file info. Return a string containing the
filename, the abbreviated commit hash and the author date in ISO 8601
format.
Arguments:
fname: Name of the file to check.
Returns:
A 3-tuple containing the file name, latest short hash and latest
commit date.
"""
args = ['git', '--no-pager', 'log', '-1', '--format=%h|%at', fname]
try:
b = subprocess.check_output(args, startupinfo=startupinfo)
data = b.decode()[:-1]
h, t = data.split('|')
out = (fname[2:], h, time.gmtime(float(t)))
except (subprocess.CalledProcessError, ValueError):
return None
return out
if __name__ == '__main__':
main()
示例输出:
serve-git|8d92934|2012-08-31 21:21:38 +0200
setres|8d92934|2012-08-31 21:21:38 +0200
mydec|e711e27|2008-04-09 21:26:05 +0200
sync-iaudio|8d92934|2012-08-31 21:21:38 +0200
tarenc|8d92934|2012-08-31 21:21:38 +0200
keypress.sh|a5c0fb5|2009-09-29 00:00:51 +0200
tolower|8d92934|2012-08-31 21:21:38 +0200
修改:更新后使用os.devnull
(也适用于ms-windows)而不是/dev/null
。
Edit2 :使用startupinfo
来禁止在ms-windows上弹出命令提示。
Edit3 :使用__future__
使其与Python 2和3兼容。使用2.7.9和3.4.3进行测试。现在也是available on github。
答案 1 :(得分:3)
使用GitPython,这可以胜任:
import git
repo = git.Repo("./repo")
tree = repo.tree()
for blob in tree:
commit = repo.iter_commits(paths=blob.path, max_count=1).next()
print(blob.path, commit.committed_date)
请注意commit.committed_date
处于“自纪元以来的秒数”格式。
答案 2 :(得分:2)
您可以使用GitPython库。
答案 3 :(得分:0)
这对我有用
http://gitpython.readthedocs.io/en/stable/tutorial.html#the-tree-object
根据文档由于树只允许直接访问它们的中间子条目,使用遍历方法获取迭代器以递归方式检索条目
它创建一个执行工作的生成器对象
print tree.traverse()
<generator object traverse at 0x0000000004129DC8>
for blob in tree.traverse():
commit=repo.iter_commits(paths=blob.path).next()
print(blob.path,commit.committed_date)