我有两个文件需要从该文件中获取记录计数和校验和,并与其他文件进行比较

时间:2017-03-23 08:07:22

标签: bash shell hadoop hive hdfs

我有两个文件,一个是sg_fx_cur_rates.csv需要从该文件中获取记录计数和校验和,并与其他文件进行比较sg_fx_cur_mapping_20170221.tok

第一个文件

head -10 sg_fx_cur_mapping_20170221.csv
UNIQUE IDENTIFIER AC CODE LONGNAME RISK FACTOR IDENTIFIER INSTRUMENT TYPE QUOTED CURRENCY BASE CURRENCY GLOBAL RATE LOCALE MXG_CURRENCY MXG_PIPSIZE MXG_LOCALE
SC.1000010374 FX_AED*USD_SPOT_GBL FX_AED*USD_SPOT_GBL  FX_SPOT AED USD 1 UK USD-AED  UK
SC.1000010375 FX_AMD*USD_SPOT_GBL FX_AMD*USD_SPOT_GBL  FX_SPOT AMD USD 1 UK
SC.1000010376 FX_ANG*USD_SPOT_GBL FX_ANG*USD_SPOT_GBL  FX_SPOT ANG USD 1 UK USD-ANG  UK
SC.1000010376 FX_ANG*USD_SPOT_GBL FX_ANG*USD_SPOT_GBL  FX_SPOT ANG USD 1 UK USD-ANG  SG
SC.1000010376 FX_ANG*USD_SPOT_GBL FX_ANG*USD_SPOT_GBL  FX_SPOT ANG USD 1 UK USD-ANG  US
SC.1000010377 FX_AOA*USD_SPOT_GBL FX_AOA*USD_SPOT_GBL  FX_SPOT AOA USD 1 UK USD-AOA  UK
SC.1000010377 FX_AOA*USD_SPOT_GBL FX_AOA*USD_SPOT_GBL  FX_SPOT AOA USD 1 UK USD-AOA  SG
SC.1000010378 FX_ARS*USD_SPOT_GBL FX_ARS*USD_SPOT_GBL  FX_SPOT ARS USD 1 UK USD-ARS  UK
SC.1000010380 FX_BBD*USD_SPOT_GBL FX_BBD*USD_SPOT_GBL  FX_SPOT BBD USD 1 UK USD-BBD  UK

第二档

cat sg_fx_cur_mapping_20170221.tok
CHECKSUM|0b4e6c5935c39ae311dd477e216892d5
RECORDCOUNT|00000000681

1 个答案:

答案 0 :(得分:0)

由于我们没有很多线索(正在使用哪种校验和算法?),这里有一个选项:

> cat checker.sh
#!/bin/bash
echo "CHECKSUM|"$(md5sum $1 | cut -d' ' -f1) > /tmp/$$
echo "RECORDCOUNT|"$(wc -l $1 | cut -d' ' -f1) >> /tmp/$$
if [ $(comm -1 -2 <(sort /tmp/$$) <(sort $2) | wc -l) -eq 2 ]
then
  echo "Files are equal"
else 
  echo "Files are different"
fi
rm /tmp/$$
return 0

并以这种方式使用它:

> checker.sh sg_fx_cur_mapping_20170221.csv sg_fx_cur_mapping_20170221.tok