Question

我想将项目列表（键/值对）转换为表格格式。解决方案可以是bash脚本，awk，sed或其他一些方法。

假设我有一个很长的列表，例如：

date and time: 2013-02-21 18:18 PM
file size: 1283483 bytes
key1: value
key2: value

date and time: 2013-02-21 18:19 PM
file size: 1283493 bytes
key2: value

...

我想转换为带有制表符或其他分隔符的表格格式，如下所示：

date and time   file size   key1    key2
2013-02-21 18:18 PM 1283483 bytes   value   value
2013-02-21 18:19 PM 1283493 bytes       value
...

或者像这样：

date and time|file size|key1|key2
2013-02-21 18:18 PM|1283483 bytes|value|value
2013-02-21 18:19 PM|1283493 bytes||value
...

我已经查看了这个An efficient way to transpose a file in Bash这样的解决方案，但似乎我在这里遇到了不同的情况。 awk解决方案部分适用于我，它会将所有行输出到一长串列中，但我需要将列限制为唯一列表。

awk -F': ' '
{ 
    for (i=1; i<=NF; i++)  {
        a[NR,i] = $i
    }
}
NF>p { p = NF }
END {    
    for(j=1; j<=p; j++) {
        str=a[1,j]
        for(i=2; i<=NR; i++){
            str=str" "a[i,j];
        }
        print str
    }
}' filename

更新

感谢所有提供解决方案的人。其中一些看起来非常有前景，但我认为我的工具版本可能已经过时，我得到一些语法错误。我现在看到的是，我并没有以非常明确的要求开始。在我阐述完整要求之前，我很荣幸能够成为第一个提供解决方案的人。我写了这个问题已经度过了漫长的一天，因此不太清楚。

我的目标是提出一个非常通用的解决方案，用于将多个项目列表解析为列格式。我认为该解决方案不需要支持超过255列。列名不会提前知道，这样解决方案将适用于任何人，而不仅仅是我。两个已知的东西是kev /值对（“：”）和列表之间的分隔符（空行）之间的分隔符。为这些变量设置变量会很好，这样它们就可以配置为其他人重用它。

通过查看提出的解决方案，我意识到一个好的方法是对输入文件进行两次传递。第一步是收集所有列名，可选择对它们进行排序，然后打印标题。其次是抓取列的值并打印它们。

Answer 1

这是使用GNU awk的一种方式。像：

一样运行

awk -f script.awk file

script.awk的内容：

BEGIN {
    # change this to OFS="\t" for tab delimited ouput
    OFS="|"

    # treat each record as a set of lines
    RS=""
    FS="\n"
}

{
    # keep a count of the records
    ++i

    # loop through each line in the record
    for (j=1;j<=NF;j++) {

        # split each line in two
        split($j,a,": ")

        # just holders for the first two lines in the record
        if (j==1) { date = a[1] }
        if (j==2) { size = a[1] }

        # keep a tally of the unique key names
        if (j>=3) { !x[a[1]] }

        # the data in a multidimensional array:
        # record number . key = value
        b[i][a[1]]=a[2]
    }
}

END {

    # sort the unique keys
    m = asorti(x,y)

    # add the two strings to a numerically indexed array
    c[1] = date
    c[2] = size

    # set a variable to continue from
    f=2

    # loop through the sorted array of unique keys
    for (j=1;j<=m;j++) {

        # build the header line from the file by adding the sorted keys
        r = (r ? r : date OFS size) OFS y[j]

        # continue to add the sorted keys to the numerically indexed array
        c[++f] = y[j]
    }

    # print the header and empty
    print r
    r = ""

    # loop through the records ('i' is the number of records)
    for (j=1;j<=i;j++) {

        # loop through the subrecords ('f' is the number of unique keys)
        for (k=1;k<=f;k++) {

            # build the output line
            r = (r ? r OFS : "") b[j][c[k]]
        }

        # and print and empty it ready for the next record
        print r
        r = ""
    }
}

以下是名为file的测试文件的内容：

date and time: 2013-02-21 18:18 PM
file size: 1283483 bytes
key1: value1
key2: value2

date and time: 2013-02-21 18:19 PM
file size: 1283493 bytes
key2: value2
key1: value1
key3: value3

date and time: 2013-02-21 18:20 PM
file size: 1283494 bytes
key3: value3
key4: value4

date and time: 2013-02-21 18:21 PM
file size: 1283495 bytes
key5: value5
key6: value6

结果：

2013-02-21 18:18 PM|1283483 bytes|value1|value2||||
2013-02-21 18:19 PM|1283493 bytes|value1|value2|value3|||
2013-02-21 18:20 PM|1283494 bytes|||value3|value4||
2013-02-21 18:21 PM|1283495 bytes|||||value5|value6

Answer 2

这是一个纯粹的awk解决方案：

# split lines on ": " and use "|" for output field separator
BEGIN { FS = ": "; i = 0; h = 0; ofs = "|" }

# empty line - increment item count and skip it
/^\s*$/ { i++ ; next } 

# normal line - add the item to the object and the header to the header list
# and keep track of first seen order of headers
{
   current[i, $1] = $2
   if (!($1 in headers)) {headers_ordered[h++] = $1}
   headers[$1]
}

END {
   h--

   # print headers
   for (k = 0; k <= h; k++)
   {
      printf "%s", headers_ordered[k]
      if (k != h) {printf "%s", ofs}
   } 
   print "" 

   # print the items for each object
   for (j = 0; j <= i; j++)
   {
      for (k = 0; k <= h; k++)
      {
         printf "%s", current[j, headers_ordered[k]]
         if (k != h) {printf "%s", ofs}
      }
      print ""
   }
}

示例输入（请注意，在最后一项之后应该有换行符）：

foo: bar
foo2: bar2
foo1: bar

foo: bar3
foo3: bar3
foo2: bar3

示例输出：

foo|foo2|foo1|foo3
bar|bar2|bar|
bar3|bar3||bar3

注意：如果您的数据中嵌入了“：”，则可能需要更改此内容。

Answer 3

这不对列结构做任何假设，因此它不会尝试对它们进行排序，但是，所有字段都以相同的顺序打印所有记录：

use strict;
use warnings;

my (@db, %f, %fields);
my $counter = 1;
while (<>) {
  my ($field, $value) = (/([^:]*):\s*(.*)\s*$/);
  if (not defined $field) {
    push @db, { %f };
    %f = (); 
  } else {
    $f{$field} = $value;
    $fields{$field} = $counter++ if not defined $fields{$field};
  }
}
push @db, \%f;

#my @fields = sort keys %fields; # alphabetical order
my @fields = sort {$fields{$a} cmp $fields{$b} } keys %fields; #first seen order

# print header
print join("|", @fields), "\n";

# print rows
for my $row (@db) {
  print join("|", map { $row->{$_} ? $row->{$_} : "" } @fields), "\n";
}

Answer 4

使用perl

use strict; use warnings;

# read the file paragraph by paragraph
$/ = "\n\n";

print "date and time|file size|key1|key2\n";

# parsing the whole file with the magic diamond operator
while (<>) {
    if (/^date and time:\s+(.*)/m) {
        print "$1|";
    }

    if (/^file size:(.*)/m) {
        print "$1|";
    }

    if (/^key1:(.*)/m) {
        print "$1|";
    }
    else {
        print "|";
    }

    if (/^key2:(.*)/m) {
        print "$1\n";
    }
    else {
        print "\n";
    }
}

用法

perl script.pl file

输出

date and time|file size|key1|key2
2013-02-21 18:18 PM| 1283483 bytes| value| value
2013-02-21 18:19 PM| 1283493 bytes|| value

Answer 5

示例：

> ls -aFd * | xargs -L 5 echo | column -t
bras.tcl@      Bras.tpkg/           CctCc.tcl@       Cct.cfg      consider.tcl@
cvsknown.tcl@  docs/                evalCmds.tcl@    export/      exported.tcl@
IBras.tcl@     lastMinuteRule.tcl@  main.tcl@        Makefile     Makefile.am
Makefile.in    makeRule.tcl@        predicates.tcl@  project.cct  sourceDeps.tcl@
tclIndex

如何将列表转换为bash中的表

5 个答案:

使用perl

用法

输出