Question

我有一个文本文件，它是一个包含80,000多个单词的字典，我需要以某种方式对其进行解析，但是首先我需要对其进行分类，以便稍后可以对其进行解析。正则表达式中是否可以匹配两个新行而不是一个？即搜索整个文件，寻找两个新行而不是一个新行？因为字典中每个新词后都有两行。

整个文件的文本格式如下：

English : Pyramid of the Cerebellum

Section: Medical

Translation: ...

Description: ...


English: Pyramid

Section: General

Translation: ...

Description: ...

如您所见，每个单词后都有2行，所以我想找到所有大于2的行...然后使用AWK替换它，有可能吗？

我希望输出如下：

English : Pyramid of the Cerebellum

Section: Medical

Translation: ...

Description: ...

English: Pyramid

Section: General

Translation: ...

Description: ...

Answer 1

一种非常快速的方法是使用awk

awk 'BEGIN{RS="";ORS="\n\n"}1' /path/to/your/file > /path/to/new/file

这是如何工作的：

awk知道概念记录（默认情况下为几行），您可以通过其记录分隔符RS定义记录。如果将RS的值设置为空字符串，则它将匹配任何空行作为记录分隔符。值ORS是输出记录分隔符。它指出应在两个连续记录之间打印哪个分隔符。设置为两个字符。最后，语句1是{print $0}的简写，它打印当前记录，然后打印输出记录分隔符ORS。

Answer 2

请您尝试以下。

awk '!/^$/{flag=""} /^$/{flag++} flag==2 && /^$/{next} 1'  Input_file

说明： 现在也为上述代码添加了说明。

awk '
!/^$/{              ##Checking if a line is NOT starting with blank space if yes then do following.
  flag=""           ##Nullifying value of variable flag here.
}                   ##Closing this blosk condition here.
/^$/{               ##Checking if a line starts with a blank line then do following.
  flag++            ##Incrementing value of variable flag with 1 here.
}                   ##Closing this bock condition here.
flag==2 && /^$/{    ##Checking condition here if variable flag value is 2 and line is empty then do following.
  next              ##next keyword is out of the box keyword for awk and will skip all further statements from here.
}                   ##Closing this block condition here.
1                   ##By mentioning 1 printing edited/non-edited line here.
' Input_file        ##Mentioning Input_file name here.

Answer 3

您可以使用以下awk命令：

awk '!NF&&!n{print;n=1}NF{print;n=0}' your_text_file

如何在正则表达式中匹配两个新行（\ n）而不是一个？

3 个答案: