shell脚本 - 按浮点值选择CSV行

时间:2014-04-10 11:02:55

标签: shell csv cut

我在阅读CSV文件并选择具有特定列浮点值的行时遇到了一种奇怪的行为。

这是输入文件的摘录。

ben@truc:$ head summary.fasta.csv
scf7180000753635;170043549;XP_001849446.1;27.72;184;2e-13;74.7
scf7180000753636;340728919;XP_003402759.1;25.78;322;8e-19;93.6
scf7180000753642;328716306;XP_003245892.1;33.51;191;7e-27;119
scf7180000753642;512919417;XP_004929373.1;43.18;132;1e-23;108
scf7180000753642;512914080;XP_004928052.1;40.16;127;5e-21;94.7
scf7180000753664;328696819;XP_003240139.1;37.99;179;2e-23;107
scf7180000753664;328696819;XP_003240139.1;26.67;30;2e-23;25.4
scf7180000753664;328703138;XP_003242103.1;31.65;218;1e-20;99.4
scf7180000753669;383855900;XP_003703448.1;68.92;74;2e-23;102
scf7180000753669;380030611;XP_003698937.1;72.06;68;3e-22;99.8

这是我的shell脚本代码:

#!/bin/sh
echo "extracting the values"
# prepare output files
echo "" > "40_sequence_identity.csv"
echo "" > "60_sequence_identity.csv"
echo "" > "80_sequence_identity.csv"
while read -r line; do
    #debug: check if line is correclty read
    echo $line
    #attribute each CSV column value to a variable
    query=`echo $line | cut -d ';' -f1`
    gi=`echo $line | cut -d ';' -f2`
    refseq=`echo $line | cut -d ';' -f3`
    seq_identity=`echo $line | cut -d ';' -f4`
    align_length=`echo $line | cut -d ';' -f5`
    evalue=`echo $line | cut -d ';' -f6`
    score=`echo $line |  -d ';' -f7`

    #debug: check if cut command is OK
    echo "seqidentity:"$seq_identity
    # test float value of column 4, if superior to a threshold, write the line in a specific line
    if [ $( echo "$seq_identity >= 40" | bc ) ]; then
        echo "$line" >> "40_sequence_identity.csv"
    fi
    if [ $( echo "$seq_identity >= 60" | bc ) ]; then
        echo "$line" >> "60_sequence_identity.csv"
    fi
    if [ $( echo "$seq_identity >= 80" | bc ) ]; then
        echo "$line" >> "80_sequence_identity.csv"
    fi
done < "summary.fasta.csv" 
echo "DONE!"

这是奇怪的输出。

extracting the values
scf7180000753635;170043549;XP_001849446.1;27.72;184;2e-13;74.7
./create_project_directories.sh: 1: ./create_project_directories.sh: -d: not found
seqidentity:27.72
scf7180000753636;340728919;XP_003402759.1;25.78;322;8e-19;93.6
./create_project_directories.sh: 1: ./create_project_directories.sh: -d: not found
seqidentity:25.78
scf7180000753642;328716306;XP_003245892.1;33.51;191;7e-27;119
./create_project_directories.sh: 1: ./create_project_directories.sh: -d: not found
seqidentity:33.51
scf7180000753642;512919417;XP_004929373.1;43.18;132;1e-23;108
./create_project_directories.sh: 1: ./create_project_directories.sh: -d: not found
seqidentity:43.18
scf7180000753642;512914080;XP_004928052.1;40.16;127;5e-21;94.7
./create_project_directories.sh: 1: ./create_project_directories.sh: -d: not found
seqidentity:40.16
scf7180000753664;328696819;XP_003240139.1;37.99;179;2e-23;107
./create_project_directories.sh: 1: ./create_project_directories.sh: -d: not found
seqidentity:37.99
scf7180000753664;328696819;XP_003240139.1;26.67;30;2e-23;25.4
./create_project_directories.sh: 1: ./create_project_directories.sh: -d: not found
seqidentity:26.67

首先,3个输出文件(blast_summary_superior_40_sequence_identity.csv ...)包含所有行,就好像测试不起作用一样。 其次,文件解析似乎没问题,但是这个奇怪的消息:-d:not found,来自无处。虽然它出现在'echo'之前,显示$ seqidentity的值并且可能与cut命令有关。

知道为什么我有这样的输出? 当我在控制台中手动执行命令时,这是有效的。 但不是在执行整个脚本时。

感谢您的帮助。

1 个答案:

答案 0 :(得分:1)

您收到错误:-d: not found因为第17行命令不完整

score=`echo $line |  -d ';' -f7`

所以它应该是:

score=$(echo $line |  cut -d ';' -f7)