需要通过SED或GREP清除文件

时间:2019-04-16 11:23:41

标签: bash sed grep

我有这些文件

  • NotRequired.txt(具有需要删除的行)
  • Need2CleanSED.txt(大文件,需要清除)
  • Need2CleanGRP.txt(大文件,需要清除)

内容:

more NotRequired.txt
[abc-xyz_pqr-pe2_123]
[lon-abc-tkt_1202]
[wat-7600-1_414]
[indo-pak_isu-5_761]

我正在阅读上面的文件,并希望通过SED和GREP尝试从Need2Clean???.txt中删除行,但没有成功。

myFile="NotRequired.txt"

while IFS= read -r HKline

do

  sed -i '/$HKline/d' Need2CleanSED.txt

done < "$myFile"


myFile="NotRequired.txt"

while IFS= read -r HKline

do

  grep -vE \"$HKline\" Need2CleanGRP.txt > Need2CleanGRP.txt

done < "$myFile"

看起来好像变量和字符[]出了问题。

3 个答案:

答案 0 :(得分:3)

What you're doing is extremely inefficient and error prone. Just do this:

grep -vF -f NotRequired.txt Need2CleanGRP.txt > tmp &&
mv tmp Need2CleanGRP.txt

Thanks to grep -F the above treats each line of NotRequired.txt as a string rather than a regexp so you don't have to worry about escaping RE metachars like [ and you don't need to wrap it in a shell loop - that one command will remove all undesirable lines in one execution of grep.

Never do command file > file btw as the shell might decide to execute the > file first and so empty file before command gets a chance to read it! Always do command file > tmp && mv tmp file instead.

答案 1 :(得分:0)

您的假设是正确的。 [...]构造会查找该集合中的任何字符,因此您必须以\开头(“转义”)它们。最简单的方法是在原始文件中执行此操作:

sed -i -e 's:\[:\\[:' -e 's:\]:\\]:' "${myFile}"

如果您不喜欢这样做,可以将sed命令放在要将文件定向到的位置:

done < replace.txt|sed -e 's:\[:\\[:' -e 's:\]:\\]:'

最后,您可以在每个HKline变量上使用sed:

HKline=$( echo $HKline | sed -e 's:\[:\\[:' -e 's:\]:\\]:' )

答案 2 :(得分:0)

try gnu sed:

sed -Ez 's/\n/\|/g;s!\[!\\[!g;s!\]!\\]!g; s!(.*).!/\1/d!' NotRequired.txt| sed -Ef - Need2CleanSED.txt

Two sed process are chained into one by shell pipe
NotRequired.txt is 'slurped' by sed -z all at once and substituted its \n and [ meta-char with | and \[ respectively of which the 2nd process uses it as regex script for the input file, ie. Need2CleanSED.txt. 1st process output;

/\[abc-xyz_pqr-pe2_123\]|\[lon-abc-tkt_1202\]|\[wat-7600-1_414\]|\[indo-pak_isu-5_761\]/d

add -u ie. unbuffered, option to evade from batch process, sort of direct i/o