在匹配模式之间打印行块

时间:2017-09-12 10:42:51

标签: awk sed

我有一堆电子邮件档案,看起来像下面的模式。我正在尝试将以From:开头并以模式From:结尾的电子邮件分开。

From: Bikram Suwal veekram@gmail.com
To: John Doe johndoe@gmail.com
Subject: Greetings
Date: 04/05/1990 10:30 PM

Hello.
World
This is a test email.


From: Bikram Suwal veekram@gmail.com
To: John Doe johndoe@gmail.com
Subject: New Greetings
Date: 04/05/1990 10:30 PM

Yet another test email

使用awk,awk '/From/{p=1} p; /Date/{exit}' text.txt。它给出了:

From: Bikram Suwal veekram@gmail.com
To: John Doe johndoe@gmail.com
Subject: Greetings
Date: 04/05/1990 10:30 PM

如何修改awk命令以获取:

From: Bikram Suwal veekram@gmail.com
To: John Doe johndoe@gmail.com
Subject: Greetings
Date: 04/05/1990 10:30 PM

Hello.
World
This is a test email.

提前致谢

5 个答案:

答案 0 :(得分:2)

对于您当前的输入,应用以下GNU awk 表达式就足够了:

Html.RenderAction("index", "Questions");

输出:

awk -v RS="\n\n\n+" 'NR==1{ print; exit }' file

答案 1 :(得分:2)

如果您对新行字符不感兴趣,那么

$ awk '/^From:/{count++} count==1' infile 

$ awk -v mail_no=1 '/^From:/{count++}count==mail_no; count>mail_no{exit}' infile 

示例:

$ awk '/^From:/{count++} count==1' infile 
From: Bikram Suwal veekram@gmail.com
To: John Doe johndoe@gmail.com
Subject: Greetings
Date: 04/05/1990 10:30 PM

Hello.
World
This is a test email.


$ awk '/^From:/{count++} count==2' infile 
From: Bikram Suwal veekram@gmail.com
To: John Doe johndoe@gmail.com
Subject: New Greetings
Date: 04/05/1990 10:30 PM

Yet another test email

要删除尾随的换行符,我更喜欢使用tac,如下所示

$ awk -v mail_no=1 '/^From:/{count++}count==mail_no;count>mail_no{exit} ' infile | tac | awk 'NF{found=1}found' | tac
From: Bikram Suwal veekram@gmail.com
To: John Doe johndoe@gmail.com
Subject: Greetings
Date: 04/05/1990 10:30 PM

Hello.
World
This is a test email.

输入:

$ cat infile
From: Bikram Suwal veekram@gmail.com
To: John Doe johndoe@gmail.com
Subject: Greetings
Date: 04/05/1990 10:30 PM

Hello.
World
This is a test email.


From: Bikram Suwal veekram@gmail.com
To: John Doe johndoe@gmail.com
Subject: New Greetings
Date: 04/05/1990 10:30 PM

Yet another test email


From: Bikram Suwal veekram@gmail.com
To: John Doe johndoe@gmail.com
Subject: New Greetings
Date: 04/05/1990 10:30 PM

Yet another test email

答案 2 :(得分:0)

找到解决方案

这将从数据中提取所有电子邮件

String[] separated = data.split("From:");
separated[0];
separated[1]; ......

Pattern p = Pattern.compile("\\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\\.[A-Z]{2,4}\\b",Pattern.CASE_INSENSITIVE);
ArrayList<String> emails = new ArrayList<String>();
ArrayList<ArrayList<String>> buchEmails = new ArrayList<ArrayList<String>>();
for(int i=0;i<seperated.length();i++){   
  Matcher matcher = p.matcher(seperated[i]);
    while(matcher.find()) {
      emails.add(matcher.group());
    }
   buchEmails.add(i,emails);
}

答案 3 :(得分:0)

如果@veekram在其中一条评论中说明了下一条&#34;来自:&#34;是分隔符,然后是:

 x = [None]*10

会做的。

答案 4 :(得分:0)

为什么不使用旨在解析像`formail这样的电子邮件的程序?例如,要提取第一条消息:

<in.mbox formail -1 -f -ds

输出:

From: Bikram Suwal veekram@gmail.com
To: John Doe johndoe@gmail.com
Subject: Greetings
Date: 04/05/1990 10:30 PM

Hello.
World
This is a test email.