Question

我在shell中编写这个小程序：

#!/bin/bash

#***************************************************************
# Synopsis:
# Read from an inputfile each line, which has the following format:
#
# llnnn nnnnnnnnnnnnllll STRING lnnnlll n nnnn nnnnnnnnn nnnnnnnnnnnnnnnnnnnn ll ll   
#
# where:
# n is a <positive int>
# l is a <char> (no special chars)
# the last set of ll ll  could be:
#   - NV 
#   - PV 
#
# Ex:
# AVO01  000060229651AVON FOOD OF ARKHAM C A  S060GER   0  1110  000000022  00031433680006534689  NV  PV
#
# The program should check, for each line of the file, the following:
# I) If the nnn of character llnnn (beggining the line) is numeric,
#    this is, <int>
# II) If the character ll ll is NV (just one set of ll) then
#    copy that line in an outputfile, and add one to a counter. 
# III) If the character ll ll is NP (just one set of ll) then
#     copy that line in an outputfile, and add one to a counter.
# 
# NOTICE: could be just one ll. Ex: [...] NV [...]
#                                   [...] PV [...] 
#         or both Ex: [...] NV PV [...] 
#
#
# Execution (after generating the executable):
# ./ inputfile outputfileNOM outputfilePGP
#***************************************************************


# Check the number of arguments that could be passed.
if [[ ${#@} != 3 ]]; then
        echo "Error...must be: myShellprogram <inputfile> <outputfileNOM> <outputfilePGP>\n"
    exit
fi  

#Inputfile: is in position 1 on the ARGS
inputfile=$1 
#OutputfileNOM: is in position 2 on the ARGS
outputfileNOM=$2
#OutputfilePGP: is in position 3 on the ARGS
outputfilePGP=$3

#Main variables. Change if needed. 
# Flags the could appear in the <inputfile>
#
# ATTENTION!!!: notice that there is a white space
# before the characters, this is important when using
# the regular expression in the conditional:
# if [[  $line =~ $NOM ]]; then [...] 
#
# If the white space is NOT there it would match things like:
# ABCNV ... which is wrong!!
NOM=" NV"
PGP=" PV"
#Counters of ocurrences
countNOM=0;
countPGP=0;


#Check if the files exists and have the write/read permissions
if [[ -r $inputfile && -w $outputfileNOM && -w $outputfilePGP ]]; then
    #Read all the lines of the file.
    while read -r line  
        do
            code=${line:3:2} #Store the code (the nnn) of the "llnnn" char set of the inputfile

            #Check if the code is numeric
            if [[ $code =~ ^[0-9]+$ ]] ; then

                #Check if the actual line has the NOM flag
                if [[  $line =~ $NOM ]]; then
                    echo "$line" >> "$outputfileNOM"
                    (( ++countNOM ))
                fi  

                #Check if the actual line has the PGP flag
                if [[  $line =~ $PGP ]]; then
                    echo "$line" >> "$outputfilePGP"
                    (( ++countPGP ))
                fi

            else
              echo "$code is not numeric"
              exit  

            fi      

        done < "$inputfile"

    echo "COUN NON $countNOM"       
    echo "COUN PGP $countPGP"
else
    echo "FILE: $inputfile does not exist or does not have read permissions"
    echo "FILE: $outputfileNOM does not exist or does not have write permissions"
    echo "FILE: $outputfilePGP does not exist or does not have write permissions"
fi

我有一些问题：

I）当我这样做时：

 if [[ -r $inputfile && -w $outputfileNOM && -w $outputfilePGP ]]; then
 [...]
 else
     echo "FILE: $inputfile does not exist or does not have read permissions"
     echo "FILE: $outputfileNOM does not exist or does not have write permissions"
     echo "FILE: $outputfilePGP does not exist or does not have write permissions"
 fi

我想在其他地方打印东西，因此，打印正确的消息。例如：如果“$ outputfileNOM”没有写入权限，则只打印该错误。但是，我不想放很多if / else，Ex：

if [[ -r $inputfile ]]; then
[...]
if  [[-w $outputfileNOM ]] then 
[...]
else
  For the READ permission, and the other else for the WRITE

有没有办法做到这一点，没有使用嵌套方法，并保持可读性。

II）关于：

 if [[ -r $inputfile && -w $outputfileNOM && -w $outputfilePGP ]]

如果我使用标志“-x”而不是-r或-w，则

是可以的。我没有明确的含义是什么：

-x FILE
          FILE exists and execute (or search) permission is granted

III）注意我的代码中的ATTENTION标签。我注意到有一些可能性，例如：在之前，之后或之前或之后都有空格。我相信输入文件的一致性，但如果它们发生变化，它就会爆炸。在这种情况下我该怎么办？是否有一种优雅的方式来管理它？（异常？）

非常感谢！

Answer 1

我之前被=~运营商咬了。

原则上我会告诉你引用... =~ "$NOM"，但 starting with bash 3.2 there is a special behavior和=~ ""的论点（即.',）。链接（）相当冗长地说：

o现在将字符串参数引用到[[command's =〜（regexp）运算符强制字符串匹配，与其他模式匹配运算符一样。

和

E14）为什么将模式参数引用到正则表达式匹配        条件运算符（=〜）导致正则表达式匹配停止工作？

在bash-3.2之前的bash版本中，引用常规的效果   未指定[[command's =〜运算符]的表达式参数。   实际效果是双引用所需的模式参数   反斜杠引用特殊模式字符，干扰了   由双引号词扩展执行的反斜杠处理   与== shell模式匹配运算符处理方式不一致   引用的字符。

在bash-3.2中，shell被改为内部引用单个字符 -   和=〜运算符的双引号字符串参数，它抑制了   正则表达式处理特有的字符的特殊含义   （\', ['，),（'，+', *'，{',？'，^', and |'，NOM="[ ]NV" $'）和军队   它们是字面上匹配的。这与`=='模式一致   匹配运算符处理其模式参数的引用部分。

由于引用字符串参数的处理已更改，因此有几个问题   已经出现，其中主要是模式论证中的空白问题   以及bash-3.1和bash-3.2之间引用字符串的不同处理方式。   通过使用shell变量来保存模式可以解决这两个问题。   由于在全部扩展shell变量时不执行单词拆分   [[command]的操作数允许用户按照自己的意愿引用模式   在分配变量时，将值展开为单个字符串   可能包含空格。第一个问题可以通过使用反斜杠来解决   或任何其他引用机制来逃避图案中的空白区域。

您可能会考虑{{1}}的内容。（请注意，我没有对此进行测试。）

Answer 2

好的，谢谢帮助我的人。根据他们的建议，我将回答我自己的问题：

关于：

I）虽然这个解决方案使用条件，但非常优雅：

#File error string
estr='ERROR: %s files does no exist or does not have %s permissions.\n'  

#Check if the files exists and have the write/read permissions
[ -r $inputfile ] || { printf "$estr" "<$inputfile>" "read" && exit; }
[ -w $outputfileNOM ] || { printf "$estr" "<$outputfileNOM>" "write" && exit; }
[ -w $outputfilePGP ] || { printf "$estr" "<$outputfilePGP>" "write" && exit; }

退出后

注意 ; ！

II）来自 chmod的手册：

字母rwxXst为受影响的用户选择文件模式位：读取（r），写入（w），执行（或搜索目录）（x）...

来自 Wikipedia （文件系统权限）：

读取权限，授予读取文件的权限。为目录设置时，此权限授予读取目录中文件名称的能力（但不能查找有关它们的任何进一步信息，如内容，文件类型，大小，所有权，权限等）。

写入权限，授予修改文件的权限。为目录设置时，此权限授予修改目录中条目的权限。这包括创建文件，删除文件和重命名文件。

执行权限，授予执行文件的能力。必须为可执行二进制文件（例如，已编译的C ++程序）或shell脚本（例如，Perl程序）设置此权限，以允许操作系统运行它们。为目录设置时，此权限授予遍历其树以访问文件或子目录的能力，但不能查看目录中的文件内容（除非设置了读取）。

III）感谢@dmckee获取链接和 turtle 。

# ATTENTION!!!: notice the \< and \> surrounding
# the characters, this is important when using
# the regular expression in the conditional:
# if [[  $line =~ $NOM ]]; then [...]
#
# If those characters are NOT there it would match things like:
# ABCNV ... which is wrong!!
# They (the \< and \>) indicate that the 'NV' can't be 
# contained in another word.
NOM='\<NV\>'
PGP='\<PV\>'

Shell程序代码：regexp和文件处理

2 个答案: