Question

我正在创建一个bash脚本来修改和汇总grep和sed的信息。但它被卡住了。

    #!/bin/bash

# This script extracts some basic information
# from text files and prints it to screen.
#
# Usage: ./myscript.sh </path/to/text-file>


#Extract lines starting with ">@HWI" 

    ONLY=`grep -v ^\>@HWI`

#replaces A and G with R in lines

    ONLYR=`sed -e s/A/R/g -e s/G/R/g $ONLY`

    grep R $ONLYR | wc -l

Answer 1

编写shell脚本以执行您似乎尝试执行的操作的正确方法是：

awk '
    !/^>@HWI/ {
        gsub(/[AG]/,"R")
        if (/R/) {
            ++cnt
        }
    END { print cnt+0 }
' "$@"

将它放在myscript.sh文件中并像今天一样执行它。

要清楚 - 上面的大部分代码都是一个awk脚本，shell脚本部分是shell只调用awk的第一行和最后一行，并将输入文件名传递给它。

如果您想拥有中间变量，则可以使用以下命令创建/打印它们：

awk '
    !/^>@HWI/ {
        only = $0
        onlyR = only
        gsub(/[AG]/,"R",onlyR)
        print "only:", only
        print "onlyR:", onlyR
        if (/R/) {
            ++cnt
        }
    END { print cnt+0 }
' "$@"

以上内容可在所有UNIX系统上稳健，可移植且高效地运行。

Answer 2

首先，正如@fedorqui所评论的那样 - 您没有向grep提供输入源，它将执行行匹配。

其次，您的脚本中存在一些问题，当您决定操作某些数据时，将来会导致不必要的行为：

将匹配行存储在数组或文件中，之后您将从中读取值。变量ONLY不是任务的正确数据结构。
按照惯例，环境变量（PATH，EDITOR，SHELL，...）和内部shell变量（BASH_VERSION，RANDOM，.. 。）完全资本化。所有其他变量名称应为小写。以来变量名称区分大小写，这种约定避免意外地覆盖环境和内部变量。

这里有一个更好的脚本版本，考虑到这些要点，但有一个关于你在最后一行尝试做什么的未解决的问题：grep R $ONLYR | wc -l：

#!/bin/bash

# This script extracts some basic information
# from text files and prints it to screen.
#
# Usage: ./myscript.sh </path/to/text-file>

input_file=$1

# Read lines not matching the provided regex, from $input_file
mapfile -t only < <(grep -v '^\>@HWI' "$input_file")

#replaces A and G with R in lines
for((i=0;i<${#only[@]};i++)); do
    only[i]="${only[i]//[AG]/R}"
done

# DEBUG
printf '%s\n' "Here are the lines, after relpace:"
printf '%s\n' "${only[@]}"

# I'm not sure what you were trying to do here. Am I gueesing right that you wanted
# to count the number of R's in ALL lines ?
# grep R $ONLYR | wc -l

bash脚本修改和提取信息

2 个答案: