在逐列解析CSV文件时比较数值的问题

时间:2018-12-11 04:18:59

标签: bash

我有一个包含以下列的CSV文件:

Year,113 Cause Name,Cause Name,State,Deaths,Age-adjusted Death Rate

这些是文件中的一些示例行:

2016,Malignant neoplasms (C00-C97),Cancer,Missouri,12696,167
2015,Malignant neoplasms (C00-C97),Cancer,Missouri,12965,173.4
2014,Malignant neoplasms (C00-C97),Cancer,Missouri,13067,177.7
2013,Malignant neoplasms (C00-C97),Cancer,Missouri,12955,179.4
2012,Malignant neoplasms (C00-C97),Cancer,Missouri,12919,182.3

我正在尝试在Bash中构建CSV解析器,该解析器将从用户那里获取参数并显示与参数匹配的行。到目前为止,这是我的代码:

#!/bin/sh

# set up the arguments
for i in "$@"
do
case $i in
    -y=*|--year=*)
    YEAR="${i#*=}"
    shift # past argument=value
    ;;
    -c=*|--cause=*)
    CAUSE="${i#*=}"
    shift # past argument=value
    ;;
    -s=*|--state=*)
    STATE="${i#*=}"
    shift # past argument=value
    ;;
    -d=*|--deaths=*)
    DEATHS="${i#*=}"
    shift # past argument=value
    ;;
    -ad=*|--age_adjusted=*)
    AGE_ADJUSTED="${i#*=}"
    shift # past argument=value
    ;;
    *)
          # unknown option
    ;;
esac
done

# print out the values of the passed arguments
echo $YEAR
echo $CAUSE
echo $STATE
echo $DEATHS
echo $AGE_ADJUSTED

# read the file, segregating value in each column
while IFS='' read -r year cause1 cause2 state deaths age_adj; do
    if [ -z "$DEATHS" ]; then                       # user did not pass a "number of deaths" argument
        if [ -z "$AGE_ADJUSTED" ]; then             # user also did not pass an age "adjusted death rate" argument
            echo "$year $cause1 $cause2 $state $deaths $age_adj" | grep "$YEAR" | grep "$CAUSE" | grep "$STATE"
        else                                        # user passed an age "adjusted death rate" argument, check against that value
            if [[ $age_adj -ge $AGE_ADJUSTED ]]; then
                echo "$year $cause1 $cause2 $state $deaths $age_adj" | grep "$YEAR" | grep "$CAUSE" | grep "$STATE"
            fi
        fi
    else                                            # user passed a "number of deaths" argument
        if [ -z "$AGE_ADJUSTED" ]; then             # user did not pass an "age adjusted death rate" argument
            echo "$year $cause1 $cause2 $state $deaths $age_adj" | grep "$YEAR" | grep "$CAUSE" | grep "$STATE"
        else                                        # user passed both "number of deaths" and "age adjusted death rate" arguments         
            if [[ $deaths -ge $DEATHS &&  $age_adj -ge $AGE_ADJUSTED ]]; then
                echo "$year $cause1 $cause2 $state $deaths $age_adj" | grep "$YEAR" | grep "$CAUSE" | grep "$STATE"
            fi
        fi
    fi    
done < "$1"

当我尝试将死亡数列($ deaths)与传递的自变量值($ DEATHS)和“年龄调整死亡率”列($ age_adj)与传递的自变量值($ AGE_ADJUSTED)进行比较时,会发生问题。它不会触发比较,而是打印出与其他参数匹配的所有结果(如果通过)。

感谢您的帮助。预先感谢。

我以以下格式传递参数:

./main.sh -y=2015 -d=50000 <additional arguments if I want to> ./file.csv

1 个答案:

答案 0 :(得分:1)

使用awk

YEAR="2015"
CAUSE=""
STATE=""
DEATHS=""
AGE_ADJUSTED=""

awk \
    -vFS=, -vOFS=, \
    -vYEAR=$YEAR \
    -vCAUSE=$CAUSE \
    -vSTATE=$STATE \
    -vDEATHS=$DEATHS \
    -vAGE_ADJUSTED=$AGE_ADJUSTED \
'{
    if (length(YEAR) != 0) {
        if ($1 != YEAR) {
            next;
        }
    }
    if (length(CAUSE) != 0) {
        if ($2 != CAUSE) {
            next;
        }
    }
    if (length(STATE) != 0) {
        if ($3 != STATE) {
            next;
        }
    }
    if (length(DEATHS) != 0) {
        if ($4 != DEATHS) {
            next;
        }
    }
    if (length(AGE_ADJUSTED) != 0) {
        if ($5 != AGE_ADJUSTED) {
            next;
        }
    }
    print
}' file.csv

可通过tutorialspoint获得实时版本。

  1. 我认为awk脚本非常简单。如果变量长度为非零,则检查文件中的列是否与变量值匹配。如果没有,请转到next行。如果所有匹配项匹配或为零,则为print当前行。
  2. -vVAR=VAL设置内部awk变量。 -vFS=,-vOFS=,设置awk的输出和输入分隔符。
  3. -y=*|--year=*)-出于可移植性和可读性的原因,我建议您遵循POSIX utility conventions和/或GNU argument syntax。只需使用GNU getopt(我更喜欢)或BASH getopts(广泛使用,但不支持长参数)即可。
  4. for i in "$@"; do .... shift; ...移位对参数不影响。阅读完后,您将无法更改它们。因此shift在那里毫无用处,什么也不做。我更喜欢使用while (($#)); do .... shift; done;或仅使用for i; do ... done
  5. while IFS='' read -r通常用于读取行而不拆分IFS变量控制read命令将分割行的变量。 read从输入中读取数据,直到读取由-d指定的分隔符(默认换行符),然后使用在IFS中找到的任何字符将其拆分。您打算while IFS=, read -r ...