Question

我有很多文件包含：

>c_000000000288
abcdefg

>c_000000000270
abcdefg

>c_000000000062
abcdefg

*Note: continues for hundreds of lines

文件名类似于：

M07.compare.M010.info500.info2.1.txt
M07.compare.M010.info500.info2.2.txt
M07.compare.M010.info500.info2.3.txt
M07.compare.M010.info500.info2.word.txt

注意：我希望代码适用于不同的数字范围（例如1-10或1-3）并包含＆＃34; word＆＃34;同时。

我希望结果是一个制表符分隔的文件，其中包含：

c_000000000288   1
c_000000000270   1
c_000000000062   1
c_000000000258   2
c_000000000191   3
c_000000000188   3
c_000000003713   3
c_000000000179   3
c_000000000162   word
c_000000000097   word

我曾尝试搜索多个论坛，但未能找到解决方案。到目前为止，我只能提取＆＃34; name＆＃34;到制表符分隔文件，但我还没有想出如何有效地添加文件名信息。

感谢您的帮助！

Answer 1

您只需要：

shopt -s extglob
awk -v OFS='\t' '
FNR==1 { n=split(FILENAME,f,/\./); ext=f[n-1] }
sub(/^>/,""){ print $0, ext }
' M07.compare.M010.info500.info2.{{1..10},word}.txt

制作制表符分隔文件，其中包含文件和文件名的一部分信息

1 个答案: