CSV数据比较Linux

时间:2014-03-06 18:15:48

标签: linux bash csv debian

我是Linux的新手,我尝试在bash文件中操作一些数据。我尝试了许多解决方案但没有成功。我有三个条件让我迷失了太多的命令:

  1. 比较文件中是否已存在xx:xx:xx:xx:xx(第3列)

  2. 如果已找到,请比较包含相同xx的所有行的时间(第1列):xx:xx:xx:xx

  3. 如果时间相同,请比较信号强度(第2列)并将该行发回最低值。

  4. 数据文件(csv):

    Mar 6 2014 17h29h43, -55, xx:xx:xx:xx:xx (This line has to be removed)
    Mar 6 2014 17h29h43, -38, xx:xx:xx:xx:xx
    Mar 6 2014 17h29h44, -60, yy:yy:yy:yy:yy
    

    希望的结果:

    Mar 6 2014 17h29h43, -38, xx:xx:xx:xx:xx (=> lowest value for xx:xx:xx:xx 17h29h43)
    Mar 6 2014 17h29h44, -60, yy:yy:yy:yy:yy
    

1 个答案:

答案 0 :(得分:0)

你可能应该在perl / python中这样做,但我很无聊所以为什么不:

#!/usr/bin/env bash

filename=$1

#read input file and store into associative array
declare -A arr
while IFS=',' read date value string ; do
    #Change time from 10h00h00 to 10:00:00
    dt=$(sed s'/\([0-9]\+\)h/\1:/g' <<< "$date")
    #Get rid of leading spaces
    string=$(tr -d ' ' <<< "$string")
    value=$(tr -d ' ' <<< "$value")
    #Convert datetime to unix epoch
    epoch=$(date --date="$dt" +%s)
    #echo "$epoch,$string"

    #Create unique key for array
    key="$epoch,$string"

    #Check if key already exists if so compare
    #values and keep the highest    
    ov=-10000
    [ ${arr["$key"]+abc} ] && ov=${arr["$key"]}

    if [[ "$value" -gt "$ov" ]]; then 
        arr["$key"]="$value"
    fi
done < "$filename"

#Display output, sort on epoch and then remove from output
for i in ${!arr[@]}; do
    #Split key into epoch and string
    epoch=$(awk -F, '{print $1}' <<< $i)
    string=$(awk -F, '{print $2}' <<< $i)

    #Convert date back to long format
    dt=$(date -d@"$epoch" '+%h %-d %Y %Hh%Mh%S')

    echo "$epoch $dt, ${arr[$i]} $string"
done | sort -h | sed 's/^[0-9]\+ //'

输入文件(数据文件):

Mar 6 2014 17h29h43, -55, xx:xx:xx:xx:xx
Mar 6 2014 17h29h43, -38, xx:xx:xx:xx:xx
Mar 6 2014 17h29h44, -60, yy:yy:yy:yy:yy
Mar 9 2014 07h29h44, -6, 11:22:33:44:55
Mar 9 2014 07h29h44,  6, 22:33:44:55:66
Mar 9 2014 07h29h44,  3, 22:33:44:55:66
Mar 1 2014 14h33h04, -60, anythingreally

输出:

Mar 1 2014 14h33h04, -60 anythingreally
Mar 6 2014 17h29h43, -38 xx:xx:xx:xx:xx
Mar 6 2014 17h29h44, -60 yy:yy:yy:yy:yy
Mar 9 2014 07h29h44, -6 11:22:33:44:55
Mar 9 2014 07h29h44, 6 22:33:44:55:66