Question

我有一个文件，其中多次出现字符串// Imports the Google Cloud client library const {BigQuery} = require('@google-cloud/bigquery'); /** * TODO(developer): Uncomment the following lines before running the sample. */ // const projectId = "your-project-id"; // const filename = "/path/to/file.csv"; // const datasetId = "my_dataset"; // const tableId = "my_table"; // Creates a client const bigquery = new BigQuery({projectId}); // Loads data from a local file into the table const [job] = await bigquery .dataset(datasetId) .table(tableId) .load(filename); console.log(`Job ${job.id} completed.`); // Check the job's status for errors const errors = job.status.errors; if (errors && errors.length > 0) { throw errors; }和test1。我试图找到找到的匹配项的行号，并根据它们出现的顺序打印这些行。每个字符串在一行中出现一次。

这里是一个例子：

test2

我天真地尝试获取行号（和顺序）是

cat input.txt
this is test1
this is not
this is test2
this is test1

它的输出是

grep -n 'test1' input.txt  | cut -d : -f1 > output1.txt
grep -n 'test2' input.txt  | cut -d : -f1 >> output1.txt
sort -k1n output1.txt

然后使用do-while循环进行打印

cat output1.txt
1
3
4

输出看起来

while read line; do
 if [[ $line =~ test1 || $line =~ test2 ]] ; then
 echo $line >> output2.txt;
done <input.txt

我的问题是此解决方案是否有更好（可能更有效）的方法，特别是根据正确的顺序获取行号。谢谢。

Answer 1

第一个解决方案： 。能否请您尝试以下操作。只会将行号放入output1.txt输出文件中。

awk '/this is test[0-9]+/{print FNR}' Input_file > "output1.txt"

要获取不同输出文件（output1.txt，output2.txt）中的行号和内容，请尝试以下操作。

awk '/this is test[0-9]+/{print FNR > "output1.txt";print $0 > "output2.txt"}' Input_file

第二个解决方案： 或者从@kamil cuk的注释中汲取灵感，并对其进行增强，使其仅获得行号。

grep -n 'test1\|test2' Input_file | cut -d':' -f1 > "output1.txt"
OR
grep -n 'this is test1\|this is test2' Input_file | cut -d':' -f1 > "output1.txt"

要使匹配的内容进入输出文件，请尝试以下操作。

grep -n 'this is test1\|this is test2' Input_file | cut -d':' -f2 > "output2.txt"

第三种解决方案： 使用sed：

要仅获取行号，请使用：

sed -n '/test[12]/{=;}'  Input_file > "output1.txt"

要获取行内容：

sed -n '/test[12]/p' Input_file > "output2.txt"

Answer 2

grep本身可以做到这一点，为什么要打扰？

$ grep -E 'test1|test2' input.txt     
this is test1                         
this is test2                         
this is test1

如果需要行号和内容：

$ grep -nE 'test1|test2' input.txt    
1:this is test1                       
3:this is test2                       
4:this is test1                       

$ grep -nE 'test[12]' input.txt       
1:this is test1                       
3:this is test2                       
4:this is test1

或者grep 'test[12]' input.txt和grep -n 'test[12]' input.txt。

一种sed的方式是：

sed -n '/test[12]/p' input.txt

仅适用于行号：

sed -n '/test[12]/=' input.txt

使用awk的优点是可以在一个命令中将不同的结果写入文件：

awk '/test[12]/{
    print FNR >"output1.txt"         #line numbers to output1.txt
    print >"output2.txt"             #contents to output2.txt
    print FNR ":" $0 >"output3.txt"  #both to output3.txt
}' input.txt

当多个字符串以相同顺序匹配时，如何获取行号

2 个答案: