Question

我必须用bash语言在ubuntu linux中编写一个脚本，它必须在命令行中输入三个参数：首先是我必须排序的文件的名称，第二个是字母（如果我想要的话，是'a'按字母顺序排序或'z'如果我想按字母顺序排序降序排序）和第三个正数排序'n'。我只需要排序'n'的多个行。例如，如果我有一个100行和n = 5的文本文件，那么我只需要排序5,10,15，...，100行，其余的必须保持不变。可以这样做吗？我可以像这样找到并排序多个'n'的行：

awk "NR%$n==0" archivo.txt | sort -f

但现在我不知道如何将这些行再次写入文件。

感谢您的关注

Answer 1

毫无疑问，这可以在纯awk中完成，但下面使用原生bash：

#!/usr/bin/env bash

input=${1:-archivo.txt} # set input to $1, or default to archivo.txt
n=${2:-5}               # set n to $2, or default to 5
i=0                     # initialize line counter to 0

while IFS= read -r line <&3; do  # always read from input on FD 3
  if (( i % n == 0 )); then      # if we're on a line being sorted...
    IFS= read -r line <&4        # ...overwrite $line from the awk | sort process on FD 4
  fi
  printf '%s\n' "$line"          # write out whatever we most recently read 
  (( ++i ))                      # increment line counter 
done 3<"$input" 4< <(awk -v "n=$n" 'NR%n==0' <"$input" | sort -f)

一些注意事项：

使用shebang在脚本的第一行显式调用bash（不是sh）可确保扩展可用。
<(awk ...)是一个进程替换 - 它计算为一个文件名，当读取时，它将提供awk命令的输出。 4<将该文件的内容连接到文件描述符＃4。
(( ))创建一个算术上下文，是ksh和bash提供的扩展（与$(( ))相比，由POSIX保证）。
有关read被调用的原因的详细信息（清除IFS并传递-r参数），请参阅BashFAQ #001。
使用awk -v "var=$var" 'awk script using var'可以避免在使用字符串连接形成脚本时可能导致的错误和注入漏洞

Answer 2

如果您不介意将整个输入文件放入内存中，可以使用gawk，这样就可以在打印之前对行的子集进行排序。

#!/usr/bin/env gawk -f

BEGIN {
  if (!inc) inc=5              # set a default
}

NR%inc {
  # This is a normal line
  nosort[NR]=$0
  next
}

{
  # This is an increment of "inc"
  tosort[NR]=$0
}

END {
  # Sort the array of increments
  asort(tosort)

  # Step through our two arrays, picking what to print based on the modulo
  n=0
  for (i=1; i<NR; i++)
    if (i%inc==0)
      print tosort[++n]
    else
      print nosort[i]
}

您可以使用以下内容运行此操作：

$ ./sortthing -v inc=5 inputfile

请注意，这使用了Gawk函数asort()，这在One True Awk中不存在。因此，如果您在* BSD或OS X上执行此操作，则可能需要安装其他工具。

（bash脚本）如何对文件中'n'的多个位置的行进行排序？

2 个答案: