优化bash脚本以处理批量数据

时间:2017-09-27 14:30:03

标签: git bash

当我们将git存储库从远程克隆到本地时,所有文件都获取本地文件系统的时间戳(日期)。我需要一个脚本,使用远程git存储库中的提交日期更新这些文件。尝试过以下脚本来完成这项工作,但处理1000个文件需要花费很多时间。有没有办法优化这个?

#!/bin/bash

IFS=$'\n\b'    
list_of_files=($(git ls-files | sort))  

for file in "${list_of_files[@]}"; do
    file_name="$file"
    TIME=$(git log --pretty=format:%ci -n 1 -- "$file_name")    
    touch -m -d $TIME "$file_name"
done

1 个答案:

答案 0 :(得分:0)

可以使用bash关联数组(首先通过日志运行)来改进它

#!/bin/bash

declare -A files_last_mod_time=() || exit 1

while read f; do files_last_mod_time[$f]=0; done < <( git ls-files | sort)

while IFS=, read chash ctime; do
    all_def=1
    for k in "${!files_last_mod_time[@]}"; do
        [[ ${files_last_mod_time[$k]} = 0 ]]&&all_def=0&&break
    done
    ((all_def==1))&&break
    echo "# processing commit hash: $chash, time: $ctime"
    while read f; do
        [[ ${files_last_mod_time[$f]} = 0 ]]&&{
            cmd=(touch -m -d "$ctime" "$f")
            echo "${cmd[@]}"
            "${cmd[@]}"
            files_last_mod_time[$f]=$ctime
        }
    done < <(git diff --name-only $chash^!)

done < <( git log --date-order --pretty=format:%H,%ci )

下面的第一个答案

用支持比bash关联数组更好的哈希的语言来做它会更容易,下面用perl的例子

#!/usr/bin/perl

%files_last_mod_time = map { $_ => '0' } `git ls-files | sort`;

for $commit_time (`git log --date-order --pretty=format:%H,%ci`) {
    last if !grep $_ eq '0', values %files_last_mod_time;
    chomp $commit_time;
    ($chash,$ctime) = split ",", $commit_time;
    print "# processing commit hash: $chash, time: $ctime\n";
    for $file (`git diff --name-only $chash^!`) {
        if (defined $files_last_mod_time{$file} && $files_last_mod_time{$file} eq '0') {
            ($file_in_cmd = $file) =~ s/'/'\\''/g;
            chomp $file_in_cmd;
            $cmd = "touch -m -d '$ctime' '$file_in_cmd'\n";
            print $cmd;
            system $cmd;
            $files_last_mod_time{$file} = $ctime;
        }
    }
}