再次问好stackoverflow!
我有一个非常大的平面文件,我希望导出所有匹配2种不同模式的记录。问题是每条记录中的行数不同,记录相互渗透。记录的最后一行是Door ID,第一行是User :.
我正在电子邮件地址中对@进行测试,最后一次登录包含'登录时间:2013-08'。我需要导出所有行,包括电子邮件地址行和上次登录行。以下是2个样本。我试过像这样使用awk:
awk '/login time: 2013-08/{e=0}/@ /{gsub("^.*@ ","",$0);e=1}{if(e==1){print}}' filename
当然失败了......
所以这是样本数据
User: afshin@runners.org
First Name: Afshi
Last Name: Noghami
Is Delegated Admin: False
IP Whitelisted: False
Account Suspended: False
Must Change Password: False
Unique ID: 102209840259208897543
ID TPYE: Cx4
Creation Time: 2013-06-07T04:14:42.000Z
Last login time: Never
Path: /Members/Inactive
IMs:
Addresses:
Organizations:
Phones:
Relations:
Door IDs:
User: jjnalli@runners.org
First Name: JISS
Last Name: NALLIKUZHY
Is a Super Admin: False
Is Delegated Admin: False
Has Agreed to Terms: True
IP Whitelisted: False
Account Suspended: False
Must Change Password: False
Unique ID: 109765147242431344122
ID TYPE: Cx4
Mailbox setup: True
Included: False
Creation Time: 2013-06-07T03:32:52.000Z
Last login time: 2013-08-02T07:13:02.000Z
Path: /Members/Inactive
IMs:
Addresses:
Organizations:
Phones:
Relations:
Door IDs:
对于具有上次登录日期的每条记录,所需的输出如下所示:
User: jjnalli@runners.org
First Name: JISS
Last Name: NALLIKUZHY
Is a Super Admin: False
Is Delegated Admin: False
Has Agreed to Terms: True
IP Whitelisted: False
Account Suspended: False
Must Change Password: False
Unique ID: 109765147242431344122
ID TYPE: Cx4
Mailbox setup: True
Included: False
Creation Time: 2013-06-07T03:32:52.000Z
Last login time: 2013-08-02T07:13:02.000Z
答案 0 :(得分:1)
也许这样的事情对你有用:
awk '$1=="User:",/login time: 2013-08/' file
答案 1 :(得分:1)
awk '/User:/{if(NR!=1){for(i=0;i<j;i++)print a[i]>"file"k;j=0;k++;}a[j++]=$0;next}{a[j++]=$0;}END{for(i=0;i<j;i++)print a[i]>"file"k}' i=0 k=1 grepper.txt
其中grepper.txt包含输入数据
这会将文件拆分为多个文件,每个文件只有一条记录(当然有多行)。
然后grep并丢弃不需要的文件。
在循环中
grep "login time: 2013-08" fileN && grep "User:" fileN | grep "@" || rm -f fileN
答案 2 :(得分:1)
^User
分组到Door ID
,而不是仅在匹配@.*login time: 20[0-9]
时打印... 我想我终于明白了你的需要:
试试这个:
sed -ne '/^Door ID/!H;/^User:/h;/^Door ID/{x;G;/@.*login time: 20[0-9]/p}' file
这符合您的要求。
合并每个数据包后,您甚至可以删除与 2013-08 匹配的所有条目:
sed -ne '/^Door ID/!H;/^User:/h;/^Door ID/{x;G;/@.*login time: 20[0-9]/{/login time: 2013-08/!p}}' file
答案 3 :(得分:1)
首先,将每条记录读入一个字段数组:
BEGIN { FS = ": " } # each line has fieldname and value
/^$/ { next } # skip blank records
$1 == "User" { # first field of new record
delete fields # delete current array
fields[$1] = $2 } # store field value in array
$1 == "Door IDs" { # last field of current record
fields[$1] = $2 # store field value in array
do_process() } # process current record
$1 != "User" && # fields between first ...
$2 != "Door IDs" { # ... and last
fields[$1] = $2 } # store field value in array
然后,做你需要做的任何事情。在这里,我打印用户和上次登录时间字段,但您可以进行所需的任何处理:
function do_process() {
print fields["User"], fields["Last login time"] }
请注意我没有测试过这段代码......
编辑:根据以下评论修改。我假设User字段始终标记新记录的开头。这是用于读取和存储记录的代码的修订版本:
BEGIN { FS = ": " # each line has fieldname and value
first = 1 } # flag for first record
/^$/ { next } # skip blank records
$1 == "User" { # first field of new record
if (first > 1) # no data the first time; skip
do_process() # process current record
delete fields # reset fields for new record
fields[$1] = $2 } # store field value in array
$1 == "Door IDs" { # last field of current record
fields[$1] = $2 # store field value in array
do_process() } # process current record
/./ { fields[$1] = $2 } # store field value in array
END { if (first > 1) # last record not processed
do_process() } # process last record
然后您可以随意处理数据。