Question

我正在寻找一种将两个或多个输入文件合并为一个输出文件的解决方案。它的工作方式与'diff -U 999999 file1.txt file2.txt＆gt;完全相同。 output.txt'做但没有差异指标。

Answer 1

这是我前一段时间用来合并一堆日志文件的脚本。我首先手动开始使用kdiff3，这对于小文件工作得很好，但随着累积的日志变得越来越大而变得非常痛苦并最终变得非常缓慢......

我们的日志包含printf("time(NULL) = %d\n", time(NULL));结果的常规出现，您必须适应以找到其他单调增加的同步标记。

#!/usr/bin/perl 
use strict;
use warnings;

# This program takes two overlapping log files and combines
# them into one, e.g.
#
#          INPUT:                    OUTPUT:
#
#   file1        file2              combined
#    AAA                               AAA
#    AAA                               AAA
#    AAA                               AAA
#    BBB          BBB                  BBB
#    BBB          BBB                  BBB
#    BBB          BBB                  BBB
#                 CCC                  CCC
#                 CCC                  CCC
#                 CCC                  CCC
#                 CCC                  CCC
#

# This programm uses the "time(NULL) = <...time...>" lines in the
# logs to match where the logs start overlapping.

# Example line matched with this function:
# time(NULL) = 1388772638
sub get_first_time_NULL {
    my $filename = shift;
    my $ret = undef;
    open(FILE, $filename);
    while (my $line = <FILE>) {
        if ($line =~ /^time\(NULL\) = (\d+)/) {
            $ret = $1;
            last;
        }
    }
    close(FILE);
    return $ret;
}

my $F1_first_time = get_first_time_NULL($ARGV[0]);
my $F2_first_time = get_first_time_NULL($ARGV[1]);

my $oldest_file;
my $newest_file;
my $newest_file_first_time;

if ($F1_first_time <= $F2_first_time) {
    $oldest_file = $ARGV[0];
    $newest_file = $ARGV[1];
    $newest_file_first_time = $F2_first_time;
} else {
    $oldest_file = $ARGV[1];
    $newest_file = $ARGV[0];
    $newest_file_first_time = $F1_first_time;
}

# Print the "AAA" part
open(FILE, $oldest_file);
while (my $line = <FILE>) {
    print $line;
    last if ($line =~ /^time\(NULL\) = $newest_file_first_time/);
}
close(FILE);

# Print the "BBB" and "CCC" parts
my $do_print = 0;
open(FILE, $newest_file);
while (my $line = <FILE>) {
    print $line if $do_print;
    $do_print = 1 if ($line =~ /^time\(NULL\) = $newest_file_first_time/);
}
close(FILE);

上面的perl脚本只处理两个文件，所以我编写了以下shell脚本来处理一个操作中的所有日志文件：

#!/bin/sh

# This script combines several overlapping logfiles into one
# continous one. See merge_log_files.pl for more details into
# how the logs are merged, this script is only glue to process
# multiple files in one operation.

set -e

MERGE_RESULT="$1"
shift

echo "Processing $1..."
cp "$1" MeRgE.TeMp.1
shift

while [ -n "$1" ]
do
    if [ ! -s "$1" ]
    then
        echo "Skipping empty file $1..."
        shift
        continue
    fi
    echo "Processing $1..."
    perl `echo $0 | sed 's/\.sh$/.pl/'` MeRgE.TeMp.1 "$1" > MeRgE.TeMp.2 && mv MeRgE.TeMp.2 MeRgE.TeMp.1
    shift;
done

mv MeRgE.TeMp.1 $MERGE_RESULT
echo "Done"

如何组合像'diff --unified'这样的文件呢？

1 个答案: