Question

我尝试搜索文件并在每次出现在文件中时提取两条相关信息。我目前的代码：

#!/bin/bash
echo "Utilized reads from ustacks output" > reads.txt
str1="utilized reads:"
str2="Parsing"
for file in /home/desaixmg/novogene/stacks/sample01/conda_ustacks.o*; do
    reads=$(grep $str1 $file | cut -d ':' -f 3
    samples=$(grep $str2 $file | cut -d '/' -f 8
    echo $samples $reads >> reads.txt
done

它正在为文件执行每行（文件具有不同数量的这些短语的实例），并为每个文件提供每行的输出：

PopA_15.fq 1081264
PopA_16.fq PopA_17.fq 1008416 554791
PopA_18.fq PopA_20.fq PopA_21.fq 604610 531227 595129
...

我希望它匹配每个实例（即两个greps的第一个实例，其次是两个）：

PopA_15.fq 1081264
PopA_16.fq 1008416
PopA_17.fq 554791
PopA_18.fq 604610
PopA_20.fq 531227
PopA_21.fq 595129
...

我该怎么做？谢谢

Answer 1

考虑到您的Input_file与显示的示例相同，并且每列上的列数均为1 PopA值，而其他行将使用数字值。关注awk可能对您有帮助。

awk '{for(i=1;i<=(NF/2);i++){print $i,$((NF/2)+i)}}'  Input_file

输出如下。

PopA_15.fq 1081264
PopA_16.fq 1008416
PopA_17.fq 554791
PopA_18.fq 604610
PopA_20.fq 531227
PopA_21.fq 595129

如果您想将命令的输出传递给awk命令，那么您可以像your command | awk command...那样执行，而不需要将Input_file添加到上面的awk命令。

Answer 2

This is what ended up working for me...any tips for more efficient code are definitely welcome

#!/bin/bash
echo "Utilized reads from ustacks output" > reads.txt
str1="utilized reads:"
str2="Parsing"
for file in /home/desaixmg/novogene/stacks/sample01/conda_ustacks.o*; do
    reads=$(grep $str1 $file | cut -d ':' -f 3)
    samples=$(grep $str2 $file | cut -d '/' -f 8)
    paste <(echo "$samples" | column -t) <(echo "$reads" | column -t) >> reads.txt
done

This provides the desired output described above.

来自两个grep的循环中的交替输出

2 个答案: