使用终端读取文件行并执行Web操作并将输出存储为单独的文件

时间:2019-04-13 05:10:01

标签: shell unix curl awk

我需要读取input.txt中的行,并忽略以'>'开头的行,并读取下一行并使用网络工具以fasta格式获取输出。我已经编写了代码,但截至目前仍无法忽略'>'行,并希望以一种更简单的方式更改行名,例如给定的example(output_1.fasta)

 $i = 0 ; 
while read line:
if line: do curl -s -d "dna_sequence="$line"&output_format=fasta" https://web.expasy.org/cgi-bin/translate/dna2aa.cgi >> my_${line}.fasta; $i+1; done < 'input.txt'

input.txt
>A123
ATTGGGCCTTTT
>B1234
GGGCCCTTAAA

output_1.fasta
>A123
#entire output from the web server
GHHGGGSSSAAA

output_2.fasta
>B1234
HHJJKKLLLL

3 个答案:

答案 0 :(得分:1)

重击解决方案:

#!/bin/env bash
i=0
while IFS=  read -r -d $'\n'
do
  ((i++))
  curl -s -d "dna_sequence=${REPLY}&output_format=fasta" 'https://web.expasy.org/cgi-bin/translate/dna2aa.cgi' > "./output_${i}.fasta"
done < <( sed '/^>/d' "./input.txt" )
exit 0

测试:

$ cat ./input.txt
>A123
ATTGGGCCTTTT
>B1234
GGGCCCTTAAA
$ i=0
$ while IFS=  read -r -d $'\n'
> do
>   ((i++))
>   curl -s -d "dna_sequence=${REPLY}&output_format=fasta" 'https://web.expasy.org/cgi-bin/translate/dna2aa.cgi' > "./output_${i}.fasta"
> done < <( sed '/^>/d' "./input.txt" )
$ ls -1 ./output_*
./output_1.fasta
./output_2.fasta
$ cat ./output_1.fasta
> VIRT-65321:3'5' Frame 1
KRPN
> VIRT-65321:3'5' Frame 2
KGP
> VIRT-65321:3'5' Frame 3
KAQ
> VIRT-65321:5'3' Frame 1
IGPF
> VIRT-65321:5'3' Frame 2
LGL
> VIRT-65321:5'3' Frame 3
WAF
$ cat ./output_2.fasta
> VIRT-65327:3'5' Frame 1
FKG
> VIRT-65327:3'5' Frame 2
LRA
> VIRT-65327:3'5' Frame 3
-GP
> VIRT-65327:5'3' Frame 1
GPL
> VIRT-65327:5'3' Frame 2
GP-
> VIRT-65327:5'3' Frame 3
ALK

答案 1 :(得分:0)

您现在已经接近复杂程度,不再需要使用bash,并且应该考虑将其移植到更合适的脚本语言imo ..而且您没有正确地转义$ line,如果发生什么情况,会发生什么情况? $ line包含&foo=bar吗? curl不会将其解释为dna_sequence的一部分,curl会认为这是一个名为foo的全新变量,其中包含bar。这是PHP的端口: / p>

#!/usr/bin/env php
<?php
$ch = curl_init();
curl_setopt_array($ch, array(
    CURLOPT_URL => 'https://web.expasy.org/cgi-bin/translate/dna2aa.cgi',
    CURLOPT_RETURNTRANSFER => 1,
    CURLOPT_ENCODING => ''
));
foreach (file('input.txt', FILE_SKIP_EMPTY_LINES) as $line) {
    $line = trim($line);
    if (!strlen($line) || $line[0] === '>') {
        continue;
    }
    curl_setopt_array($ch, array(
        CURLOPT_POST => 1,
        CURLOPT_POSTFIELDS => http_build_query(array(
            'dna_sequence' => $line,
            'output_format' => 'fasta'
        ))
    ));
    file_put_contents("my_{$line}.fasta", curl_exec($ch));
}
curl_close($ch);

答案 2 :(得分:0)

$ cat tst.sh
#!/bin/env bash

i=0
while IFS= read -r line; do
    if [[ $line =~ ^\> ]]; then
        outfile="output_((++i)).fasta"
        printf '%s\n' "$line" > "$outfile"
    else
        curl -s -d 'dna_sequence="'"$line"'"&output_format=fasta' 'https://web.expasy.org/cgi-bin/translate/dna2aa.cgi' >> "$outfile"
    fi
done < input.txt

$ ./tst.sh

$ cat output_1.fasta
>A123
> VIRT-92094:3'5' Frame 1
KRPN
> VIRT-92094:3'5' Frame 2
KGP
> VIRT-92094:3'5' Frame 3
KAQ
> VIRT-92094:5'3' Frame 1
IGPF
> VIRT-92094:5'3' Frame 2
LGL
> VIRT-92094:5'3' Frame 3
WAF

$ cat output_2.fasta
>B1234
> VIRT-92247:3'5' Frame 1
FKG
> VIRT-92247:3'5' Frame 2
LRA
> VIRT-92247:3'5' Frame 3
-GP
> VIRT-92247:5'3' Frame 1
GPL
> VIRT-92247:5'3' Frame 2
GP-
> VIRT-92247:5'3' Frame 3
ALK