比较两个包含十六进制数的文本文件

时间:2014-03-10 22:57:55

标签: sed awk text-processing

我有两个包含Hex值的文件A.txt和B.txt。 A.txt看起来像:

blah blah blah ;AA=0012FF34, BB=0012FC0
blah blah blah ;AA=00120F54
blah blah blah ;CC=00978E4A
blah blah blah ;AA=007649A4, BB=0032FFF, CC=00F5FC6

和B.txt类似于:

b-base    b-size
00020000 00001000 blah blah blah
00030000 00001000 blah blah blah
00040000 00001000 blah blah blah
000E0000 00001000 blah blah blah
000F0000 00005000 blah blah blah

如何打印A.txt的行,其中该行中AA,BB,CC或DD的值位于以下边界之一:

00020000< <00020000+00001000
00030000< <00030000+00001000
00040000< <00040000+00001000
000E0000< <000E0000+00001000
000F0000< <000F0000+00005000

要点:A.txt的第一部分(在“;”之前)可以有任意长度。

4 个答案:

答案 0 :(得分:4)

惊呆了没有人在awk中做到了!发生了什么事?

gawk '
   BEGIN {j=0}
   FNR==NR {
      lo[j] = strtonum("0x"$1)
      hi[j] = strtonum("0x"$2)+lo[j]
      j++
      next
   }
   {
      line=$0                 # Save line in case we need to print it
      sub(/.*;/,"",$0)        # Remove everything before semicolon
      split($0,a,",")         # Split rest on commas into array a[]
      for(x in a){            # Iterate through all AA=, BB=, CC=
        sub(/.*=/,"",a[x])    # Remove everything up to and including = sign
        d=strtonum("0x"a[x])  # Convert to decimal
        for(y in lo){
           if((d>=lo[y])&&(d<=hi[y])){print line;break}
        }
      }
   }
   ' B.txt A.txt

答案 1 :(得分:1)

我写了一些应该完成这项工作的Perl,因为在AWK或sed中似乎有点多做。它在B.txt中构造范围的哈希,并打印出来自A.txt的任何行,其中逗号分隔列表中的任何一个值都位于任何范围内(当前>=开始和<结束)。请注意,这对我来说是一个学习练习,所以我欢迎任何建设性的反馈。

#!/usr/bin/env perl

use strict;
use warnings;

open my $fh,"<","B.txt" or die "couldn't open file: $!";
<$fh>; # skip first line
my @range;
while (<$fh>) {
    my @F = split;
    push @range, [hex($F[0]), hex($F[0]) + hex($F[1])];
}
close $fh;

open $fh,"<","A.txt" or die "couldn't open file: $!";
while (<$fh>) {
    my $match = 0;
  OUTER:
    for (split ',', (split ';')[1]) {
        chomp (my $val = (split '=')[1]);
        $val = hex $val;
        for my $ref (@range) {
            if ($val >= $$ref[0] && $val < $$ref[1]) {
                $match = 1;
                last OUTER;
            }
        }
    }
    print if $match;
}

答案 2 :(得分:1)

我也觉得这超出了awk和sed的范围,所以我和Thomas一样,但是在python中。随意使用它或以您认为合适的方式改进它。

def main(a, b):
    with open(b) as bf:
        data = bf.readlines()[1:]
        limits = get_limits(data)
    with open(a) as af:
        for line in af:
            maybe_print(line, limits)

def get_limits(data):
    limits = []
    for line in data:
        base, size = line.split(' ')[0:2]
        limits.append((int(base, 16), int(size, 16)))
    return limits

def maybe_print(line, limits):
    data = line.split(';')[1]
    data = data.split(', ')
    for datum in data:
        value = int(datum.split('=')[1], 16)
        for base, size in limits:
            if value > base and value < base + size:
                print line,
                return

if __name__ == '__main__':
    import sys
    main(sys.argv[1], sys.argv[2])

我这样运行:python <scriptname> A.txt B.txt

答案 3 :(得分:1)

我的Ruby实现,记录......

ranges = File.readlines("B.txt").grep(/^([\dA-F]+)\s+([\dA-F]+)/i){ ($1.hex)..($1.hex+$2.hex) }
File.open("A.txt") do |f|
    f.each_line do |line|
        numbers = line.scan(/\b\w+=([\dA-F]+)/).collect{|x| x.first.hex }
        puts line if numbers.detect{|x| ranges.detect{|y| y === x } }}
    end
end