Hackish coreutils解决方案

Question

假设我的流是x * N行长，其中x是记录数，N是每个记录的列数，并按列输出。例如，x = 2，N = 3：

1
2
Alice
Bob
London
New York

如何将每行记录，以记录数量为模，返回列：

1   Alice   London
2   Bob     New York

如果我使用paste，使用N - s，我会获得转置输出。我可以使用split，-l选项等于N，然后使用paste重新组合这些部分，但我想在流中执行此操作而不会将临时文件全部扫出在这个地方。

是否有一个“简单”的解决方案（即，而不是调用像awk这样的东西）？我想可能有一些神奇的join解决方案，但我看不到它......

编辑另一个例子，当x = 5且N = 3时：

1
2
3
4
5
a
b
c
d
e
alpha
beta
gamma
delta
epsilon

预期产出：

1   a   alpha
2   b   beta
3   c   gamma
4   d   delta
5   e   epsilon

Answer 1

您正在寻找pr来“列出”流：

pr -T -s$'\t' -3 <<'END_STREAM'
1
2
Alice
Bob
London
New York
END_STREAM

1       Alice   London
2       Bob     New York

pr在coreutils中。

Answer 2

大多数系统都应该包含一个名为pr的工具，用于 pr int文件。它是part of POSIX.1所以几乎可以肯定在你将使用的任何系统上。

$ pr -3 -t < inp1
1                       a                       alpha
2                       b                       beta
3                       c                       gamma
4                       d                       delta
5                       e                       epsilon

或者，如果您愿意，

$ pr -3 -t -s, < inp1
1,a,alpha
2,b,beta
3,c,gamma
4,d,delta
5,e,epsilon

或

$ pr -3 -t -w 20 < inp1
1      a      alpha
2      b      beta
3      c      gamma
4      d      delta
5      e      epsilo

请查看上面的链接以获取标准使用信息，或man pr查看操作系统中的特定选项。

Answer 3

为了可靠地处理输入，您需要知道输出文件中的列数或输出文件中的行数。如果您只知道列数，则需要两次读取输入文件。

Hackish coreutils解决方案

# If you don't know the number of output lines but the
# number of output columns in advance you can calculate it
# using wc -l 

# Split the file by the number of output lines
split -l"${olines}" file FOO # FOO is a prefix. Choose a better one
paste FOO*

AWK解决方案

如果您事先知道输出列的数量，则可以使用此awk脚本：

convert.awk ：

BEGIN {
    # Split the file into one big record where fields are separated
    # by newlines
    RS=''
    FS='\n' 
}
FNR==NR {
    # We are reading the file twice (see invocation below)
    # When reading it the first time we store the number
    # of fields (lines) in the variable n because we need it
    # when processing the file.
    n=NF
}
{
    # n / c is the number of output lines
    # For every output line ...
    for(i=0;i<n/c;i++) {
        # ... print the columns belonging to it
        for(ii=1+i;ii<=NF;ii+=n/c) {
            printf "%s ", $ii
        }
        print "" # Adds a newline
    }
}

并将其称为：

awk -vc=3 -f convert.awk file file # Twice the same file

如果您事先知道输出线的数量，可以使用以下awk脚本：

convert.awk ：

BEGIN {
    # Split the file into one big record where fields are separated
    # by newlines
    RS=''
    FS='\n' 
}
{
    # x is the number of output lines and has been passed to the 
    # script. For each line in output
    for(i=0;i<x;i++){
        # ... print the columns belonging to it
        for(ii=i+1;ii<=NF;ii+=x){
            printf "%s ",$ii
        }   
        print "" # Adds a newline
    }   
}

并称之为：

awk -vx=2 -f convert.awk file

Answer 4

请尝试以下操作，因此我只考虑代码中给出的x（行）值，因为N（列）的值将相应地设置。所以这是解决方案。

解决方案1：当您不关心输出的顺序时（根据您的Input_file）。

awk -v x_rows=2 '
j==x_rows{
  j=k=""
}
j++<=x_rows{
  ++k;
  array[k]=array[k]?array[k] OFS $0:$0;
}
END{
  for(i in array){
    print array[i]
}
}
'   Input_file

解决方案第二：如果您需要与排序顺序相同的输出作为您的Input_file，那么以下内容可以帮助您。

awk -v x_rows=5 '
j==x_rows{
  j=k=""
}
j++<=x_rows{
  ++k;
  array[k]=array[k]?array[k] OFS $0:$0;
}
END{
  for(i in array){
     print array[i] | "sort -k1"
}
}
'   Input_file

加入行，以模数记录

4 个答案:

Hackish coreutils解决方案

AWK解决方案