我有一堆电子邮件档案,看起来像下面的模式。我正在尝试将以From:
开头并以模式From:
结尾的电子邮件分开。
From: Bikram Suwal veekram@gmail.com
To: John Doe johndoe@gmail.com
Subject: Greetings
Date: 04/05/1990 10:30 PM
Hello.
World
This is a test email.
From: Bikram Suwal veekram@gmail.com
To: John Doe johndoe@gmail.com
Subject: New Greetings
Date: 04/05/1990 10:30 PM
Yet another test email
使用awk,awk '/From/{p=1} p; /Date/{exit}' text.txt
。它给出了:
From: Bikram Suwal veekram@gmail.com
To: John Doe johndoe@gmail.com
Subject: Greetings
Date: 04/05/1990 10:30 PM
如何修改awk
命令以获取:
From: Bikram Suwal veekram@gmail.com
To: John Doe johndoe@gmail.com
Subject: Greetings
Date: 04/05/1990 10:30 PM
Hello.
World
This is a test email.
提前致谢
答案 0 :(得分:2)
对于您当前的输入,应用以下GNU awk 表达式就足够了:
Html.RenderAction("index", "Questions");
输出:
awk -v RS="\n\n\n+" 'NR==1{ print; exit }' file
答案 1 :(得分:2)
如果您对新行字符不感兴趣,那么
$ awk '/^From:/{count++} count==1' infile
$ awk -v mail_no=1 '/^From:/{count++}count==mail_no; count>mail_no{exit}' infile
示例:
$ awk '/^From:/{count++} count==1' infile
From: Bikram Suwal veekram@gmail.com
To: John Doe johndoe@gmail.com
Subject: Greetings
Date: 04/05/1990 10:30 PM
Hello.
World
This is a test email.
$ awk '/^From:/{count++} count==2' infile
From: Bikram Suwal veekram@gmail.com
To: John Doe johndoe@gmail.com
Subject: New Greetings
Date: 04/05/1990 10:30 PM
Yet another test email
要删除尾随的换行符,我更喜欢使用tac
,如下所示
$ awk -v mail_no=1 '/^From:/{count++}count==mail_no;count>mail_no{exit} ' infile | tac | awk 'NF{found=1}found' | tac
From: Bikram Suwal veekram@gmail.com
To: John Doe johndoe@gmail.com
Subject: Greetings
Date: 04/05/1990 10:30 PM
Hello.
World
This is a test email.
输入:
$ cat infile
From: Bikram Suwal veekram@gmail.com
To: John Doe johndoe@gmail.com
Subject: Greetings
Date: 04/05/1990 10:30 PM
Hello.
World
This is a test email.
From: Bikram Suwal veekram@gmail.com
To: John Doe johndoe@gmail.com
Subject: New Greetings
Date: 04/05/1990 10:30 PM
Yet another test email
From: Bikram Suwal veekram@gmail.com
To: John Doe johndoe@gmail.com
Subject: New Greetings
Date: 04/05/1990 10:30 PM
Yet another test email
答案 2 :(得分:0)
找到解决方案
这将从数据中提取所有电子邮件
String[] separated = data.split("From:");
separated[0];
separated[1]; ......
Pattern p = Pattern.compile("\\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\\.[A-Z]{2,4}\\b",Pattern.CASE_INSENSITIVE);
ArrayList<String> emails = new ArrayList<String>();
ArrayList<ArrayList<String>> buchEmails = new ArrayList<ArrayList<String>>();
for(int i=0;i<seperated.length();i++){
Matcher matcher = p.matcher(seperated[i]);
while(matcher.find()) {
emails.add(matcher.group());
}
buchEmails.add(i,emails);
}
答案 3 :(得分:0)
如果@veekram在其中一条评论中说明了下一条&#34;来自:&#34;是分隔符,然后是:
x = [None]*10
会做的。
答案 4 :(得分:0)
为什么不使用旨在解析像`formail这样的电子邮件的程序?例如,要提取第一条消息:
<in.mbox formail -1 -f -ds
输出:
From: Bikram Suwal veekram@gmail.com
To: John Doe johndoe@gmail.com
Subject: Greetings
Date: 04/05/1990 10:30 PM
Hello.
World
This is a test email.