关于:邮箱(mbox格式)电子邮件
多封邮件文件: Inbox.mbox
From - Thu Mar 26 16:16:21 2015
From: Mail Delivery System <Mailer-Daemon@200.netwizz.com>
To: edge@notterribe.org
Subject: Mail delivery failed: returning message to sender
Message-Id: <E1Yb3yX-0004CB-QH@200.netwizz.com>
Date: Thu, 26 Mar 2015 02:21:17 -0700
Date: Thu, 26 Mar 2015 02:20:44 -0700
From: edge <edge@notterribe.org>
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Icedove/31.5.0
MIME-Version: 1.0
To: leasing@theedgehenderson.com
CC: etpmgr@movein.net, t.simmonds@movein.ne
Subject: Fwd: Today's Breach Of Our Security.
From - Fri Mar 27 12:00:00 2015
所需的模式匹配顺序;
Date: Thu, 26 Mar 2015 02:21:17 -0700
From - Thu Mar 26 16:16:21 2015
From: Mail Delivery System <Mailer-Daemon@200.netwizz.com>
To: edge@notterribe.org
Message-Id: <E1Yb3yX-0004CB-QH@200.netwizz.com>
Subject: Mail delivery failed: returning message to sender
期望的最终结果;
Date: Thu; 26 Mar 2015 02:21:17 -0700;From - Thu Mar 26 16:16:21 2015;From: Mail Delivery System <Mailer-Daemon@200.netwizz.com>;To: edge@notterribe.org;Message-Id: <E1Yb3yX-0004CB-QH@200.netwizz.com>;Subject: Mail delivery failed: returning message to sender
目标;
*&#34; Inbox.mbox&#34;中的每条邮件消息从&#34;从&#34;开始
*匹配第一次出现仅为&#34; ^日期:| ^从| ^从:| ^到:| ^消息标识:| ^主题:&#34;,打印该行。
*格式输出结果以分号分隔的csv
我已经尝试过;
grep -a -E -i "^Date: |^From |^From: |^To: |^Message-ID: |^Subject: " Inbox.mbox
awk '/^Date: / || /^From / || /^From: / || /^To: / || /^Message-ID: / || /^Subject: /' Inbox.mbox
评论:上面给了我一个好的开始,我对awk和grep最熟悉,所以我只想尝试使用它们。难以按照我希望的顺序打印出行,匹配仅以换行结束的第一次出现。二进制数据存在于某些消息中,所以我使用-a和grep。
非常感谢任何帮助 谢谢。
答案 0 :(得分:0)
好的,所以你只有Thunderbird mbox。
以下是我的想法,名为mbox2csv
:
#!/usr/bin/gawk -f
BEGIN {
# initialize an array and set the "i" variable to 0
i = split("", row, ":");
}
# awk does not have a "join"
function join(array, sep) {
sep = sep ? sep : ";";
result = array[0];
for (i=1; i<length(array); ++i) {
result = result sep array[i];
}
return result;
}
# the keys you want to store
/^(From|Date|To|Message-ID|Subject):/ {
row[i++] = $0;
}
# every time we match a mbox message separator
/^From /{
# if there is data (not the first line)
if (length(row) > 1) {
print join(row);
# reinitialise the array and "i"
i = split("", row, ":");
}
}
然后:mbox2csv INBOX > result.csv
大警告:* 这不考虑在网络标题中常见的行继续,也不考虑转义行。
修改:代码将显示在gist
上