Question

我有一个脚本，例如：

cat list_id.txt | while read line; do for ACC in $line;
do
    echo -n "$ACC\t"
    curl -s "link=fasta&retmode=xml" |\
    grep TSeq_taxid |\
    cut -d '>' -f 2 |\
    cut -d '<' -f 1 |\
    tr -d "\n"
    echo 
sleep 0.25
done
done

该脚本允许我从list_id.txt中的ID列表中获取https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nuccore&id=${ACC}&rettype=fasta&retmode=xml中数据库中的相应名称

所以从此脚本中我得到的是

CAA42669\t9913
V00181\t7154
AH002406\t538120

我想直接打印或在现场调用new_ids.txt中回显此结果，我尝试了echo >> new_ids.txt，但文件为空。

感谢您的帮助。

Answer 1

对脚本的最小重构可能看起来像

# Avoid useless use of cat
# Use read -r
# Don't use upper case for private variables
while read -r line; do
  for acc in $line; do
    echo -n "$acc\t"
    # No backslash necessary after | character
    curl -s "link=fasta&retmode=xml" |
    # Probably use a proper XML parser for this
    grep TSeq_taxid |
    cut -d '>' -f 2 |
    cut -d '<' -f 1 |
    tr -d "\n"
    echo
    sleep 0.25
  done
done <list_id.txt >new_ids.txt

这可能仍然可以大大简化，但是在不知道输入文件到底是什么样子或curl返回什么的情况下，这有点投机。

tr -s ' \t\n' '\n' <list_id.txt |
while read -r acc; do
    curl -s "link=fasta&retmode=xml" |
    awk -v acc="$acc" '/TSeq_taxid/ {
        split($0, a, /[<>]/); print acc "\t" a[3] }'
    sleep 0.25
done <list_id.txt >new_ids.txt

在文件中回显命令结果

1 个答案: