Question

我有两个字符串（从csv解析），每个字符串有~200列。我需要比较它们并确定哪个列不同。例如：

str1file1="a,b,c,d,e,f,pp,qq"
str2file2="a,b,c,d,x,f,pp,qq"

我需要将列号作为5，并将相应的值作为输出。示例：5 e f 因为我需要比较数百万这样的字符串，速度是关键。实际记录 -

0x0009aeef,xyz,wert,57116,192.168.17.1,45320,192.168.17.2,45320,ctty,lkipop,1408477403,1408477403,,1408477722,1408477403,1408477718,2,0,5,98,0,3055925732,0,0,0,0,15756,15732,24,0,0,0,0,0,0,0.68,23,0,1,23,15776,0.00,15270,459,1,0,0,0,0,0,0,0,0,0,5.755,1408477403,1408477718,2,0,7,98,0,112988428,0,0,0,0,15776,15742,34,0,0,0,0,0,0,8.32,33,0,1,33,15756,0.01,15555,185,0,0,0,0,0,0,0,0,0,0,3.077,-0,-0,-12,-11,-23,-36,-31,-39,22,35,19,28,,,,,1.8,2.4,2.2,2.6,1.8,2.4,2.2,2.5,37,49,45,52,36,48,44,51,15625,107,891,5.60,12528,3204,14430,1312,723,2.65,13291,2451

0x0009aeef是主键/列（第1列），但是不能保证两个文件具有相同数量的条目（行）。我使用sort wrt主键并使用cut创建临时文件获取所需的列（~135）。其次是＆＃39;阅读＆＃39;读取第一个临时文件和grep以获取temp2文件上的匹配行。如果grep失败，则可能是关键或值不同。因此，键和值的awk。任何更好的方法非常赞赏。这是现在的代码 -

sort --field-separator=',' --key=1 $csv1 | cut -d "," -f1,...134 | tr -d '\t' > file1
sort --field-separator=',' --key=1 $csv2 | cut -d "," -f1,...134 | tr -d '\t' > file2
while read line; do
      sl=`grep "$line" file2`
      if [ "$line" != "$sl" ]; then
         rec=`echo $line | awk -F, '{ print $1 }'`
         slId=`grep $rec file2 | awk -F, '{ print $1 }'`
         if [ "$rec" = "$slId" ]; then
               #validation failed, primary key found
         else
               #prim key not found
        fi
     else
        #all is well
     fi
done < file1

Answer 1

如果速度是关键，我考虑使用mawk解析CSV文件或使用文件示例更新帖子，以便我们提供更好的解决方案。

使用Bash：

IFS=, read -a line <<<"$str1"
IFS=, read -a line2 <<<"$str2"
for i in ${!line[@]}; do
    if [[ ${line[i]} != ${line2[i]} ]]; then
        echo -e "${line[i]}\n${line2[i]}"
    fi
done

输出：

e
x

bash字符串比较2个csv文件

1 个答案: