我的互联网连接很差,并尝试下载三次相同的rar存档(> 500 Mo)。每个文件都已损坏,但我希望可以创建三个已损坏的第四个文件的交集而没有任何损坏 我对diff或comm不满意,不知道是否可以用它来做我想做的事 谢谢你的帮助!
答案 0 :(得分:0)
在评论中遵循chepner的想法,以下是我将如何重建可能的原文。
#!/bin/bash
FILE1="$1"
FILE2="$2"
FILE3="$3"
RESFILE="$4"
diff_at() {
POS=$(cmp "$1" "$2" -i $3 | sed 's/,//' | cut -d ' ' -f 5)
if [ "$POS" == "" ]; then
POS=$(du -b "$1" "$2" | cut -f 1 | sort | head -n1) # min()
fi
echo $POS
}
max() {
if [ $# -eq 0 ]; then # Degenerate case
echo ""
elif [ $# -eq 1 ]; then # Base case
echo $(($1 + 0)) # Strings are interpreted as 0
else
V=$1
shift
M=$(max $@)
echo $(($V > $M ? $V : $M))
fi
}
# Shallow check of arguments
if [ $# -ne 4 ]; then
echo "Provide three files to compare and a file to write output into."
exit 1
fi
# If one of them is larger, you cannot compare its (probable) correctness
S1=$(du -b "$FILE1" | cut -f 1)
S2=$(du -b "$FILE2" | cut -f 1)
S3=$(du -b "$FILE3" | cut -f 1)
if [ $S1 -gt $S2 -a $S1 -gt $S3 -o $S2 -gt $S1 -a $S2 -gt $S3 -o $S3 -gt $S2 -a $S3 -gt $S1 ]; then
echo "$0: Unable to reconstruct original file."
exit 1
fi
FILESIZE=$(max $S1 $S2 $S3)
# The idea is that of extracting and appending the common part
truncate -s 0 "$RESFILE" # This will overwrite existing output
BIAS=0
while [ $BIAS -lt $FILESIZE ]; do
I12=$(diff_at "$FILE1" "$FILE2" $BIAS)
I23=$(diff_at "$FILE2" "$FILE3" $BIAS)
I31=$(diff_at "$FILE3" "$FILE1" $BIAS)
# Unreconstructible, aka all of them differ at the same byte
if [ $I12 -eq $I23 -a $I12 -eq $I31 ]; then
echo "$0: Unable to reconstruct original file."
break;
fi
# Biggest common part
MAXBYTE=$(max $I12 $I23 $I31)
# Exclude the file with wrong byte
if [ $I12 -eq $MAXBYTE -o $I31 -eq $MAXBYTE ]; then
tail -c+$(($BIAS + 1)) "$FILE1" | head -c $(($MAXBYTE - 1)) >> "$RESFILE"
else
tail -c+$(($BIAS + 1)) "$FILE2" | head -c $(($MAXBYTE - 1)) >> "$RESFILE"
fi
# Update position
BIAS=$(($BIAS + $MAXBYTE - 1))
done
为了处理"猜测" (在同一位置的所有三个字节都不同的情况下),您需要提交对脚本的更改。
我认为从许多角度来看,有比上面更好的选择,所以请将其视为快速和肮脏的。