在Shell中分组以获取文件中的最大日期

时间:2017-10-11 06:06:30

标签: shell unix ksh

从下面的文件中,我只想选择那些对给定股票有最新LAST_UPDATE时间的行。

所以,这里我们有3行Stock TCS,所以我想只打印那个LAST_UPDATE值最高的那一行。

非常感谢任何帮助。

输入文件:

LAST_UPDATE,Stock,YOUR_PRICE,MY_PRICE 

04:19:44.314,INFY,146.766,146.7669

05:00:07.405,TCS,2452.21,2453.8296

06:05:25.306,TATA,0,1320.0611

06:05:27.184,TATA,0,1320.0611

07:00:04.426,TCS,2463.8,2463.8037

07:00:08.022,TCS,2463.8,2463.8037

预期输出:

LAST_UPDATE,Stock,YOUR_PRICE,MY_PRICE

04:19:44.314,INFY,146.766,146.7669

06:05:27.184,TATA,0,1320.0611

07:00:08.022,TCS,2463.8,2463.8037

2 个答案:

答案 0 :(得分:0)

你走了:

脚本:

#!/bin/ksh
tempfile="stocktemp"
mkdir $tempfile
# sort the time by stock in a temp file named by the stock name
while read line; do
        stock=`echo $line | cut -d "," -f 2`
        echo $line >> "$tempfile/$stock.txt"
done < inputfile
# Remove the line generated because of the top line in inputfile
rm $tempfile/Stock.txt
# in all the stock file ...
for file in $tempfile/*; do
        # (Init a comparitor)
        time="00:00:00"
        # ... we compare the time between the lines
        while read line; do
                # we select the time in the line where we removed the .xyz at the end (we don't need ms)
                comp=`echo $line | cut -d "," -f 1 | cut -d "." -f 1`
                # we compare the time converted in second
                if [ `echo $comp | sed s/:/*60+/g | bc` -gt `echo $time | sed s/:/*60+/g` ]; then
                time=$comp
                final=$line
                fi
        done < $file
        echo $final
done
rm -rf $tempfile

输入文件:

LAST_UPDATE,Stock,YOUR_PRICE,MY_PRICE 

04:19:44.314,INFY,146.766,146.7669

05:00:07.405,TCS,2452.21,2453.8296

06:05:25.306,TATA,0,1320.0611

06:05:27.184,TATA,0,1320.0611

07:00:04.426,TCS,2463.8,2463.8037

07:00:08.022,TCS,2463.8,2463.8037

测试:

Will /home/will # ./script.ksh
04:19:44.314,INFY,146.766,146.7669
06:05:27.184,TATA,0,1320.0611
07:00:08.022,TCS,2463.8,2463.8037

不是最干净但是有效。如果您想在文件中显示结果,可以按echo $final

更改echo $final >> output.txt

答案 1 :(得分:0)

假设:

  • 可以接受awk解决方案
  • 输入文件可能(已经)没有按时间戳排序
  • 输出按字母顺序按库存/符号名称排序(特殊情况:&#39; Stock&#39;行始终先打印)
  • 输出中将跳过/忽略空白行(否则可以编辑解决方案以在输出行之间添加空行)

一种可能的awk解决方案:

$ cat find_last.awk
$2=="Stock" { print ; next }            # print "Stock" line when we find it; skip "NF==4" processing by going to next line in file

NF==4       { lastline[$2]=$0 }         # if field count (NF) = 4 then store latest line for $2=symbol in associative array;
                                        # has added benefit that it ignores blank lines

END { n = asorti(lastline, x)           # sort our array indices (aka symbol names); 'n' = count of indices; x[] array of indices

      for ( i=1 ; i<=n; i++ ) {         # loop through our list of n array indices (aka symbol names)

          print lastline[x[i]]          # print the (last/greatest) line for a stock/symbol
      }
    }
  • END { ... }:处理完输入文件后执行(一次)

我们的示例输入文件(包括原始问题中的空白行):

$ cat infile
LAST_UPDATE,Stock,YOUR_PRICE,MY_PRICE

04:19:44.314,INFY,146.766,146.7669

05:00:07.405,TCS,2452.21,2453.8296

06:05:25.306,TATA,0,1320.0611

06:05:27.184,TATA,0,1320.0611

07:00:04.426,TCS,2463.8,2463.8037

07:00:08.022,TCS,2463.8,2463.8037

行动中的awk脚本:

$ sort infile | awk -F, -f find_last.awk
LAST_UPDATE,Stock,YOUR_PRICE,MY_PRICE
04:19:44.314,INFY,146.766,146.7669
06:05:27.184,TATA,0,1320.0611
07:00:08.022,TCS,2463.8,2463.8037
  • sort infile | awk ...:按时间戳排序输入文件,管道输出到awk命令
  • -F,:将输入字段分隔符设置为逗号(,)
  • -f find_last.awk:使用名为awk
  • 的文件中的find_last.awk个命令