Question

我有两个文件，一个文件是我的数据，另一个文件是我要从我的数据文件中提取的行号列表。我可以使用awk读取我的行文件，然后提取与行号匹配的行吗？

实施例：数据文件：

This is the first line of my data
This is the second line of my data
This is the third line of my data
This is the fourth line of my data
This is the fifth line of my data

行号文件

1
4
5

输出：

This is the first line of my data
This is the fourth line of my data
This is the fifth line of my data

我只使用命令行awk和sed来实现非常简单的东西。这是超出我的方式，我一直在谷歌搜索一个小时没有答案。

Answer 1

awk 'NR == FNR {nums[$1]; next} FNR in nums' numberfile datafile

简单地引用数组下标创建条目。循环遍历第一个文件，而NR（记录号）等于FNR（文件记录号），使用next语句存储数组中的所有行号。之后，当数组中存在第二个文件的FNR（true）时，将打印该行（这是“true”的默认操作）。

Answer 2

sed的一种方式：

sed 's/$/p/' linesfile | sed -n -f - datafile

您可以使用与awk相同的技巧：

sed 's/^/NR==/' linesfile | awk -f - datafile

编辑 - 巨大的文件替代

对于大量的行，将整个文件保存在内存中是不明智的。在这种情况下，解决方案可以是对数字文件进行排序并一次读取一行。以下内容已经过GNU awk测试：

extract.awk

BEGIN {
  getline n < linesfile
  if(length(ERRNO)) {
    print "Unable to open linesfile '" linesfile "': " ERRNO > "/dev/stderr"
    exit
  }
}

NR == n { 
  print
  if(!(getline n < linesfile)) {
    if(length(ERRNO))
      print "Unable to open linesfile '" linesfile "': " ERRNO > "/dev/stderr"
    exit
  }
}

像这样运行：

awk -v linesfile=$linesfile -f extract.awk infile

测试：

echo "2
4
7
8
10
13" | awk -v linesfile=/dev/stdin -f extract.awk <(paste <(seq 50e3) <(seq 50e3 | tac))

输出：

Answer 3

这是一个例子。 inputfile预先加载，然后输出匹配的datafile记录。

awk \
  -v RS="[\r]*[\n]" \
  -v FILE="inputfile" \
  'BEGIN \
   {
     LINES = ","
     while ((getline Line < FILE))
     {
       LINES = LINES Line ","
     }
   }
   LINES ~ "," NR "," \
   {
     print
   }
  ' datafile

Answer 4

我遇到了同样的问题。这是Thor发布的解决方案：

cat datafile \
| awk 'BEGIN{getline n<"numbers"} n==NR{print; getline n<"numbers"}'

如果像我这样你没有数字文件，但它是从stdin传递而你不想生成临时数字文件，那么这是另一种解决方案：

cat numbers \
| awk '{while((getline line<"datafile")>0) {n++; if(n==$0) {print line;next}}}'

Answer 5

while read line; do echo $（sed -n'$（echo $ line）p'Datafile.txt）;完成＆lt; numbersfile.txt

Answer 6

此解决方案...

awk 'NR == FNR {nums[$1]; next} FNR in nums' numberfile datafile

...仅在编号文件中打印唯一编号。如果数字文件包含重复的条目怎么办？然后sed是更好（但慢得多）的替代方法：

sed -nf <(sed 's/.*/&p/' numberfile) datafile

使用awk从文件中提取特定行

6 个答案:

编辑 - 巨大的文件替代