我大约有100个八列的逗号分隔文本文件。
两个文件名的示例:
sample1_sorted_count_clean.csv
sample2_sorted_count_clean.csv
文件内容示例:
Domain,Phylum,Class,Order,Family,Genus,Species,Count
Bacteria,Proteobacteria,Alphaproteobacteria,Sphingomonadales,Sphingomonadaceae,Zymomonas,Zymomonas mobilis,0.0
Bacteria,Bacteroidetes,Flavobacteria,Flavobacteriales,Flavobacteriaceae,Zunongwangia,Zunongwangia profunda,0.0
对于每个文件,我想用样本ID替换列标题“ Count”,该ID包含在文件名的第一部分(sample1,sample2)
最后,标头应如下所示:
Domain,Phylum,Class,Order,Family,Genus,Species,sample1
如果我使用我的代码,则标题如下:
Domain,Phylum,Class,Order,Family,Genus,Species,${f%_clean.csv}
for f in *_clean.csv; do echo ${f}; sed -e "1s/Domain,Phylum,Class,Order,Family,Genus,Species,RPMM/Domain,Phylum,Class,Order,Family,Genus,Species,${f%_clean.csv}/" ${f} > ${f%_clean.csv}_clean2.csv; done
我也尝试过:
for f in *_clean.csv; do gawk -F"," '{$NF=","FILENAME}1' ${f} > t && mv t ${f%_clean.csv}_clean2.csv; done
在这种情况下,“ count”将替换为整个文件名,但是该列的每一行现在都包含文件名。计数值不再存在。这不是我想要的。
您对我还可以尝试的方法有任何想法吗? 提前非常感谢您!
安娜
答案 0 :(得分:2)
如果您对awk
表示满意,请尝试以下。
awk 'BEGIN{FS=OFS=","} FNR==1{var=FILENAME;sub(/_.*/,"",var);$NF=var} 1' *.csv
编辑: :由于OP要求在第二个下划线之后删除文件名中的所有内容,然后尝试执行以下操作。
awk 'BEGIN{FS=OFS=","} FNR==1{split(FILENAME,array,"_");$NF=array[1]"_"array[2]} 1' *.csv
说明: 在此处添加上述代码的说明。
awk ' ##Starting awk program from here.
BEGIN{ ##Starting BEGIN section of code from here, which will be executed before Input_file(s) are being read.
FS=OFS="," ##Setting FS and OFS as comma here for all files all lines.
} ##Closing BEGIN section here.
FNR==1{ ##Checking condition if FNR==1 which means very first line is being read for Input_file then do following.
split(FILENAME,array,"_") ##Using split of awk out of box function by splitting FILENAME(which contains file name in it) into an array named array with delimiter _ here.
$NF=array[1]"_"array[2] ##Setting last field value to array 1st element underscore and then array 2nd element value in it.
} ##Closing FNR==1 condition BLOCK here.
1 ##Mentioning 1 will print the rest of the lines for current Input_file.
' *.csv ##Passing all *.csv files to awk program here.