我已经导入了数千个文本文件,其中包含一段我想删除的文本。
它不仅仅是一个文本块,而是一个模式。
<!--
# Translator(s):
#
# username1 <email1>
# username2 <email2>
# usernameN <emailN>
#
-->
如果出现阻止,则会列出一个或多个用户的电子邮件地址。
答案 0 :(得分:1)
这个sed解决方案可能有效:
sed '/^<!--/,/^-->/{/^<!--/{h;d};H;/^-->/{x;/^<!--\n# Translator(s):\n#\(\n# [^<]*<email[0-9]\+>\)\+\n#\n-->$/!p};d}' file
另一种选择(或许更好的解决方案?):
sed '/^<!--/{:a;N;/^-->/M!ba;/^<!--\n# Translator(s):\n#\(\n# \w\+ <[^>]\+>\)+\n#\n-->/d}' file
这会收集以<!--
开头并以-->
结尾的行,然后是集合上的模式匹配,即第二行是# Translator(s):
,第三行是#
,第四行甚至更多行跟在# username <email address>
之后,倒数第二行是#
,最后一行是-->
。如果匹配,则删除整个集合,否则将正常打印。
答案 1 :(得分:1)
我有另一个小的awk程序,可以在很少的代码行中完成任务。它可用于从文件中删除文本模式。可以设置启动和停止正则表达式。
# This block is a range pattern and captures all lines between( and including )
# the start '<!--' to the end '-->' and stores the content in record $0.
# Record $0 contains every line in the range pattern.
# awk -f remove_email.awk yourfile
# The if statement is not needed to accomplish the task, but may be useful.
# It says - if the range patterns in $0 contains a '@' then it will print
# the string "Found an email..." if uncommented.
# command 'next' will discard the content of the current record and search
# for the next record.
# At the same time the awk program begins from the beginning.
/<!--/, /-->/ {
#if( $0 ~ /@/ ){
# print "Found an email and removed that!"
#}
next
}
# This line prints the body of the file to standard output - if not captured in
# the block above.
1 {
print
}
将代码保存在“remove_email.awk”中并通过以下方式运行: awk -f remove_email.awk yourfile
答案 2 :(得分:0)
对于此任务,您需要前瞻,通常使用解析器完成。
另一种解决方案,但效率不高,将是:
sed "s/-->/&\n/;s/<!--/\n&/" file | awk 'BEGIN {RS = "";FS = "\n"}/username/{print}'
HTH Chris
答案 3 :(得分:0)
perl -i.orig -00 -pe 's/<!--\s+#\s*Translator.*?\s-->//gs' file1 file2 file3
答案 4 :(得分:-1)
如果我理解你的问题,这是我的解决方案。将以下内容保存到名为remove_blocks.awk的文件中:
# See the beginning of the block, mark it
/<!--/ {
state = "block_started"
}
# At the end of the block, if the block does not contain email, print
# out the whole block.
/^-->/ {
if (!block_contains_user_email) {
for (i = 0; i < count; i++) {
print saved_line[i];
}
print
}
count = 0
block_contains_user_email = 0
state = ""
next
}
# Encounter a block: save the lines and wait until the end of the block
# to decide if we should print it out
state == "block_started" {
saved_line[count++] = $0
if (NF>=3 && $3 ~ /@/) {
block_contains_user_email = 1
}
next
}
# For everything else, print the line
1
假设您的文本文件位于data.txt(或许多文件中):
awk -f remove_blocks.awk data.txt
上述命令将打印出文本文件中的所有内容,减去包含用户电子邮件的块。