Question

我有这些文件

NotRequired.txt（具有需要删除的行）
Need2CleanSED.txt（大文件，需要清除）
Need2CleanGRP.txt（大文件，需要清除）

内容：

more NotRequired.txt
[abc-xyz_pqr-pe2_123]
[lon-abc-tkt_1202]
[wat-7600-1_414]
[indo-pak_isu-5_761]

我正在阅读上面的文件，并希望通过SED和GREP尝试从Need2Clean???.txt中删除行，但没有成功。

myFile="NotRequired.txt"

while IFS= read -r HKline

do

  sed -i '/$HKline/d' Need2CleanSED.txt

done < "$myFile"


myFile="NotRequired.txt"

while IFS= read -r HKline

do

  grep -vE \"$HKline\" Need2CleanGRP.txt > Need2CleanGRP.txt

done < "$myFile"

看起来好像变量和字符[]出了问题。

Answer 1

What you're doing is extremely inefficient and error prone. Just do this:

grep -vF -f NotRequired.txt Need2CleanGRP.txt > tmp &&
mv tmp Need2CleanGRP.txt

Thanks to grep -F the above treats each line of NotRequired.txt as a string rather than a regexp so you don't have to worry about escaping RE metachars like [ and you don't need to wrap it in a shell loop - that one command will remove all undesirable lines in one execution of grep.

Never do command file > file btw as the shell might decide to execute the > file first and so empty file before command gets a chance to read it! Always do command file > tmp && mv tmp file instead.

Answer 2

您的假设是正确的。 [...]构造会查找该集合中的任何字符，因此您必须以\开头（“转义”）它们。最简单的方法是在原始文件中执行此操作：

sed -i -e 's:\[:\\[:' -e 's:\]:\\]:' "${myFile}"

如果您不喜欢这样做，可以将sed命令放在要将文件定向到的位置：

done < replace.txt|sed -e 's:\[:\\[:' -e 's:\]:\\]:'

最后，您可以在每个HKline变量上使用sed：

HKline=$( echo $HKline | sed -e 's:\[:\\[:' -e 's:\]:\\]:' )

Answer 3

try gnu sed:

sed -Ez 's/\n/\|/g;s!\[!\\[!g;s!\]!\\]!g; s!(.*).!/\1/d!' NotRequired.txt| sed -Ef - Need2CleanSED.txt

Two sed process are chained into one by shell pipe
NotRequired.txt is 'slurped' by sed -z all at once and substituted its \n and [ meta-char with | and \[ respectively of which the 2nd process uses it as regex script for the input file, ie. Need2CleanSED.txt. 1st process output;

/\[abc-xyz_pqr-pe2_123\]|\[lon-abc-tkt_1202\]|\[wat-7600-1_414\]|\[indo-pak_isu-5_761\]/d

add -u ie. unbuffered, option to evade from batch process, sort of direct i/o

需要通过SED或GREP清除文件

3 个答案: