Question

我有一个包含一列的文件1：

File 1
apple
pineapple
banana
cherry
kiwi
orange
mango
grape
watermelon

我需要在两个行号之间以相同的顺序提取行的内容，用制表符分隔。例如，对于第3行到第8行，输出应为：

Output (Forward)    
banana cherry kiwi orange mango grape

并且对于第7行到第2行，输出应为：

Output (reverse)    
mango orange kiwi cherry banana pineapple

我知道要使用sed按正向顺序提取行，但反向顺序有问题。

sed '3,8!d'

Answer 1

我会用awk做的：

awk -v from="7" -v to="2" 'BEGIN{rev=from>to;s=rev?to:from;e=rev?from:to}
NR>=s && NR<=e{r[NR]=$0}
NR>e{
    while(from!=to){
        printf "%s\t",r[from]
        rev?--from:++from
    }
print r[from]
exit}' file

使用此awk脚本，您只需提供from和to个变量。如果您给出反转数字，则会反向打印该范围内的行。将它嵌入shell脚本中也很容易，从shell变量中接收from, to。
处理max(from,to)行后脚本将中断。例如，如果您的文件有500万行，则您提供from:2, to:7脚本只会处理到第7行。

根据您的输入进行一些测试：

kent$  cat f
apple
pineapple
banana
cherry
kiwi
orange
mango
grape
watermelon

kent$  awk -v from="2" -v to="7" 'BEGIN{rev=from>to;s=rev?to:from;e=rev?from:to}
NR>=s && NR<=e{r[NR]=$0}
NR>e{
        while(from!=to){
                printf "%s\t",r[from]
                rev?--from:++from
        }
print r[from]
exit}' f
pineapple       banana  cherry  kiwi    orange  mango

kent$  awk -v from="7" -v to="2" 'BEGIN{rev=from>to;s=rev?to:from;e=rev?from:to}          
NR>=s && NR<=e{r[NR]=$0}
NR>e{
        while(from!=to){
                printf "%s\t",r[from]
                rev?--from:++from
        }
print r[from]
exit}' f
mango   orange  kiwi    cherry  banana  pineapple

Answer 2

$ cat tst.awk
BEGIN {
    OFS="\t"
    if (beg < end) { min=beg; max=end; delta=+1 }
    else           { min=end; max=beg; delta=-1 }
}
NR >= min { a[NR] = $0 }
NR == max {
    for (i=beg; i!=end; i+=delta) {
        printf "%s%s", a[i], OFS
    }
    print a[end]
    exit
}

$ awk -v beg=3 -v end=8 -f tst.awk file
banana  cherry  kiwi    orange  mango   grape

$ awk -v beg=7 -v end=2 -f tst.awk file
mango   orange  kiwi    cherry  banana  pineapple

Answer 3

我使用

sed '2,7!d' file1 | tac

tac只是重复反向（线性）给出的内容。

对于制表符分隔部分，有很多方法可以用sed来完成。其中之一是

sed '2,7!d' | tac | sed '1h; 1!H; $!d; x; s/\n/\t/g'

这会将保持缓冲区中的完整输入组合起来，然后将其交换到模式空间中，并使用制表符替换其中的所有换行符：

1h          # first line: save to hold buffer
1!H         # subsequent lines: append to hold buffer
$!d         # if more input is to read, stop here (don't print anything)
x           # otherwise: swap in assembled lines
s/\n/\t/g   # replace newlines with tabs.

您还可以考虑在此步骤中使用tr，但是后续换行符并不像最初想象的那样简单。

或者，您可以使用sed一次性完成所有操作：

sed '2,7 { G; x; }; $!d; x; s/\n$//; s/\n/\t/g' file1

这有点棘手：

2,7 {                  # In lines 2 to 7:
  G                    # Append the hold buffer to the pattern space
                       # this is originally a blank line and later the reverse
                       # of the lines already read
  x                    # then swap it back into the hold buffer
}
$!d                    # If the input has not ended, stop here (print nothing)
x                      # When the whole input is consumed, swap the assembled
                       # reverse lines back in
s/\n$//                # remove the trailing newline
s/\n/\t/g              # then replace the newlines with tabs

这种做法有点偏好。后者对sed仍然有些明智，但更复杂的sed脚本的蝙蝠侠解码器环属性已经显示出来。坦率地说，因为我对sed情有独钟，这让我感到很伤心，考虑在这个实例中放弃sed以获得更长但更具可读性的选择，例如awk，这不是一个坏主意：

awk 'NR == 2, NR == 7 { result = $0 sep result; sep = "\t" } END { print result }' file1

Answer 4

处理行的顺序是sed不适合的任务。由于它本质上是一个流处理器，它被设计为按正向顺序处理行。

我强烈建议您使用awk。虽然基本上即使awk没有提供以相反顺序处理输入文件的功能，但它提供了编程语言功能来缓冲感兴趣的行，并在达到停止行后以相反的顺序打印它们：

script.awk：

BEGIN {
    reverse = 0
    if(start>stop) {
        reverse = 1
        start_ = start
        start = stop
        stop = start_
    }
}

NR>=start && NR<=stop {
    buf[NR]=$0
}

NR==stop{
    if(!reverse) {
        for(i=start;i<=stop;i++) {
            printf "%s ",buf[i]
        }
    } else {
        for(i=stop;i>=start;i--) {
            printf "%s\t",buf[i]
        }
    }
    printf "\n"
    exit(0)
}

这样称呼：

awk -vstart=4 -vstop=9 -f script.awk input.file

或

awk -vstart=3 -vstop=8 -f script.awk input.file

您可以使用您想要的任何其他编程语言，而不是awk。

以相同顺序提取两个行号

4 个答案: