我想将项目列表(键/值对)转换为表格格式。解决方案可以是bash脚本,awk,sed或其他一些方法。
假设我有一个很长的列表,例如:
date and time: 2013-02-21 18:18 PM
file size: 1283483 bytes
key1: value
key2: value
date and time: 2013-02-21 18:19 PM
file size: 1283493 bytes
key2: value
...
我想转换为带有制表符或其他分隔符的表格格式,如下所示:
date and time file size key1 key2
2013-02-21 18:18 PM 1283483 bytes value value
2013-02-21 18:19 PM 1283493 bytes value
...
或者像这样:
date and time|file size|key1|key2
2013-02-21 18:18 PM|1283483 bytes|value|value
2013-02-21 18:19 PM|1283493 bytes||value
...
我已经查看了这个An efficient way to transpose a file in Bash这样的解决方案,但似乎我在这里遇到了不同的情况。 awk解决方案部分适用于我,它会将所有行输出到一长串列中,但我需要将列限制为唯一列表。
awk -F': ' '
{
for (i=1; i<=NF; i++) {
a[NR,i] = $i
}
}
NF>p { p = NF }
END {
for(j=1; j<=p; j++) {
str=a[1,j]
for(i=2; i<=NR; i++){
str=str" "a[i,j];
}
print str
}
}' filename
更新
感谢所有提供解决方案的人。其中一些看起来非常有前景,但我认为我的工具版本可能已经过时,我得到一些语法错误。我现在看到的是,我并没有以非常明确的要求开始。在我阐述完整要求之前,我很荣幸能够成为第一个提供解决方案的人。我写了这个问题已经度过了漫长的一天,因此不太清楚。
我的目标是提出一个非常通用的解决方案,用于将多个项目列表解析为列格式。我认为该解决方案不需要支持超过255列。列名不会提前知道,这样解决方案将适用于任何人,而不仅仅是我。两个已知的东西是kev /值对(“:”)和列表之间的分隔符(空行)之间的分隔符。为这些变量设置变量会很好,这样它们就可以配置为其他人重用它。
通过查看提出的解决方案,我意识到一个好的方法是对输入文件进行两次传递。第一步是收集所有列名,可选择对它们进行排序,然后打印标题。其次是抓取列的值并打印它们。
答案 0 :(得分:2)
这是使用GNU awk
的一种方式。像:
awk -f script.awk file
script.awk
的内容:
BEGIN {
# change this to OFS="\t" for tab delimited ouput
OFS="|"
# treat each record as a set of lines
RS=""
FS="\n"
}
{
# keep a count of the records
++i
# loop through each line in the record
for (j=1;j<=NF;j++) {
# split each line in two
split($j,a,": ")
# just holders for the first two lines in the record
if (j==1) { date = a[1] }
if (j==2) { size = a[1] }
# keep a tally of the unique key names
if (j>=3) { !x[a[1]] }
# the data in a multidimensional array:
# record number . key = value
b[i][a[1]]=a[2]
}
}
END {
# sort the unique keys
m = asorti(x,y)
# add the two strings to a numerically indexed array
c[1] = date
c[2] = size
# set a variable to continue from
f=2
# loop through the sorted array of unique keys
for (j=1;j<=m;j++) {
# build the header line from the file by adding the sorted keys
r = (r ? r : date OFS size) OFS y[j]
# continue to add the sorted keys to the numerically indexed array
c[++f] = y[j]
}
# print the header and empty
print r
r = ""
# loop through the records ('i' is the number of records)
for (j=1;j<=i;j++) {
# loop through the subrecords ('f' is the number of unique keys)
for (k=1;k<=f;k++) {
# build the output line
r = (r ? r OFS : "") b[j][c[k]]
}
# and print and empty it ready for the next record
print r
r = ""
}
}
以下是名为file
的测试文件的内容:
date and time: 2013-02-21 18:18 PM
file size: 1283483 bytes
key1: value1
key2: value2
date and time: 2013-02-21 18:19 PM
file size: 1283493 bytes
key2: value2
key1: value1
key3: value3
date and time: 2013-02-21 18:20 PM
file size: 1283494 bytes
key3: value3
key4: value4
date and time: 2013-02-21 18:21 PM
file size: 1283495 bytes
key5: value5
key6: value6
结果:
2013-02-21 18:18 PM|1283483 bytes|value1|value2||||
2013-02-21 18:19 PM|1283493 bytes|value1|value2|value3|||
2013-02-21 18:20 PM|1283494 bytes|||value3|value4||
2013-02-21 18:21 PM|1283495 bytes|||||value5|value6
答案 1 :(得分:1)
这是一个纯粹的awk解决方案:
# split lines on ": " and use "|" for output field separator
BEGIN { FS = ": "; i = 0; h = 0; ofs = "|" }
# empty line - increment item count and skip it
/^\s*$/ { i++ ; next }
# normal line - add the item to the object and the header to the header list
# and keep track of first seen order of headers
{
current[i, $1] = $2
if (!($1 in headers)) {headers_ordered[h++] = $1}
headers[$1]
}
END {
h--
# print headers
for (k = 0; k <= h; k++)
{
printf "%s", headers_ordered[k]
if (k != h) {printf "%s", ofs}
}
print ""
# print the items for each object
for (j = 0; j <= i; j++)
{
for (k = 0; k <= h; k++)
{
printf "%s", current[j, headers_ordered[k]]
if (k != h) {printf "%s", ofs}
}
print ""
}
}
示例输入(请注意,在最后一项之后应该有换行符):
foo: bar
foo2: bar2
foo1: bar
foo: bar3
foo3: bar3
foo2: bar3
示例输出:
foo|foo2|foo1|foo3
bar|bar2|bar|
bar3|bar3||bar3
注意:如果您的数据中嵌入了“:”,则可能需要更改此内容。
答案 2 :(得分:1)
这不对列结构做任何假设,因此它不会尝试对它们进行排序,但是,所有字段都以相同的顺序打印所有记录:
use strict;
use warnings;
my (@db, %f, %fields);
my $counter = 1;
while (<>) {
my ($field, $value) = (/([^:]*):\s*(.*)\s*$/);
if (not defined $field) {
push @db, { %f };
%f = ();
} else {
$f{$field} = $value;
$fields{$field} = $counter++ if not defined $fields{$field};
}
}
push @db, \%f;
#my @fields = sort keys %fields; # alphabetical order
my @fields = sort {$fields{$a} cmp $fields{$b} } keys %fields; #first seen order
# print header
print join("|", @fields), "\n";
# print rows
for my $row (@db) {
print join("|", map { $row->{$_} ? $row->{$_} : "" } @fields), "\n";
}
答案 3 :(得分:0)
use strict; use warnings;
# read the file paragraph by paragraph
$/ = "\n\n";
print "date and time|file size|key1|key2\n";
# parsing the whole file with the magic diamond operator
while (<>) {
if (/^date and time:\s+(.*)/m) {
print "$1|";
}
if (/^file size:(.*)/m) {
print "$1|";
}
if (/^key1:(.*)/m) {
print "$1|";
}
else {
print "|";
}
if (/^key2:(.*)/m) {
print "$1\n";
}
else {
print "\n";
}
}
perl script.pl file
date and time|file size|key1|key2
2013-02-21 18:18 PM| 1283483 bytes| value| value
2013-02-21 18:19 PM| 1283493 bytes|| value
答案 4 :(得分:0)
示例:
> ls -aFd * | xargs -L 5 echo | column -t
bras.tcl@ Bras.tpkg/ CctCc.tcl@ Cct.cfg consider.tcl@
cvsknown.tcl@ docs/ evalCmds.tcl@ export/ exported.tcl@
IBras.tcl@ lastMinuteRule.tcl@ main.tcl@ Makefile Makefile.am
Makefile.in makeRule.tcl@ predicates.tcl@ project.cct sourceDeps.tcl@
tclIndex