三个文件的交叉

时间:2018-03-18 21:09:18

标签: linux bash rar

我的互联网连接很差,并尝试下载三次相同的rar存档(> 500 Mo)。每个文件都已损坏,但我希望可以创建三个已损坏的第四个文件的交集而没有任何损坏 我对diff或comm不满意,不知道是否可以用它来做我想做的事 谢谢你的帮助!

1 个答案:

答案 0 :(得分:0)

在评论中遵循chepner的想法,以下是我将如何重建可能的原文。

#!/bin/bash

FILE1="$1"
FILE2="$2"
FILE3="$3"
RESFILE="$4"

diff_at() {
  POS=$(cmp "$1" "$2" -i $3 | sed 's/,//' | cut -d ' ' -f 5)
  if [ "$POS" == "" ]; then
    POS=$(du -b "$1" "$2" | cut -f 1 | sort | head -n1) # min()
  fi
  echo $POS
}

max() {
  if [ $# -eq 0 ]; then # Degenerate case
    echo ""
  elif [ $# -eq 1 ]; then # Base case
    echo $(($1 + 0)) # Strings are interpreted as 0
  else
    V=$1
    shift
    M=$(max $@)
    echo $(($V > $M ? $V : $M))
  fi
}

# Shallow check of arguments
if [ $# -ne 4 ]; then
  echo "Provide three files to compare and a file to write output into."
  exit 1
fi

# If one of them is larger, you cannot compare its (probable) correctness
S1=$(du -b "$FILE1" | cut -f 1)
S2=$(du -b "$FILE2" | cut -f 1)
S3=$(du -b "$FILE3" | cut -f 1)
if [ $S1 -gt $S2 -a $S1 -gt $S3 -o $S2 -gt $S1 -a $S2 -gt $S3 -o $S3 -gt $S2 -a $S3 -gt $S1 ]; then
  echo "$0: Unable to reconstruct original file."
  exit 1
fi
FILESIZE=$(max $S1 $S2 $S3)

# The idea is that of extracting and appending the common part
truncate -s 0 "$RESFILE" # This will overwrite existing output
BIAS=0
while [ $BIAS -lt $FILESIZE ]; do
  I12=$(diff_at "$FILE1" "$FILE2" $BIAS)
  I23=$(diff_at "$FILE2" "$FILE3" $BIAS)
  I31=$(diff_at "$FILE3" "$FILE1" $BIAS)

  # Unreconstructible, aka all of them differ at the same byte
  if [ $I12 -eq $I23 -a $I12 -eq $I31 ]; then
    echo "$0: Unable to reconstruct original file."
    break;
  fi

  # Biggest common part
  MAXBYTE=$(max $I12 $I23 $I31)

  # Exclude the file with wrong byte
  if [ $I12 -eq $MAXBYTE -o $I31 -eq $MAXBYTE ]; then
    tail -c+$(($BIAS + 1)) "$FILE1" | head -c $(($MAXBYTE - 1)) >> "$RESFILE"
  else
    tail -c+$(($BIAS + 1)) "$FILE2" | head -c $(($MAXBYTE - 1)) >> "$RESFILE"
  fi

  # Update position
  BIAS=$(($BIAS + $MAXBYTE - 1))
done

为了处理"猜测" (在同一位置的所有三个字节都不同的情况下),您需要提交对脚本的更改。

我认为从许多角度来看,有比上面更好的选择,所以请将其视为快速和肮脏的