嗨我已按以下格式过滤了文件。
====================================================================
===== Usage
====================================================================
--------------------------------------------------------------------
----- Processes:
--------------------------------------------------------------------
PID PPID USER VSZ STAT COMMAND
1 0 admin 812 S init
2 0 admin 0 SW [kthreadd]
3 2 admin 0 SW [migration/0]
4 2 admin 0 SW [ksoftirqd/0]
5 2 admin 0 SW [watchdog/0]
146 1 admin 712 S /usr/sbin/in.tftpd -l -u nobody -s /etc/airespi
3442 1 admin 4640 S N /usr/sbin/snmpd udp:161,udp6:161 -a -p /var/run
目前我试过
#!/usr/bin/python
import re
logfile = open('diag.txt','r')
for line in logfile.xreadlines():
if line.find('Processes') >=0 :
line = logfile.next()
line = logfile.next()
if line.find('PID PPID USER VSZ STAT COMMAND') >= 0 :
Headers = re.findall(r"[\w']+", line)
print Headers
line = logfile.next()
else:
exit
while(line.find('APmgr info: apmgrinfo -a') == -1):
#temp = re.findall('[/w]',line)
print temp
line = logfile.next()
这里我尝试读取文件,直到它与Process匹配。然后我忽略了一行。之后我将PID PPID USER VSZ STAT COMMAND放在列表中
现在我再次阅读循环中的下一行。在这里,我想把它们全部放在一个列表中。我试过[/ w],但没有正确夹板。
我在perl中已经有一个代码正在执行匹配,如下所示
until($nextline =~ m/\-\-\-\-\- APmgr info: apmgrinfo -a/){
my @temp = ();
if($nextline =~ m/\s*?(PID)\s*(PPID)\s*(USER)\s*(VSZ)\s*(STAT)\s*(COMMAND)/){
push @Headers, $1,$2,$3,$4,$5,$6;
}elsif($nextline =~ m/\s*(\d+)\s+(\d+)\s+([a-zA-Z_]+)\s+(\d+)\s+([a-zA-Z_]+)\s+(.*)$/){
my %processes = ();
@temp = split(/\s+/,$nextline);
$processes{$Headers[0]} = $1;
$processes{$Headers[1]} = $2;
$processes{$Headers[2]} = $3;
$processes{$Headers[3]} = $4;
$processes{$Headers[4]} = $5;
$processes{$Headers[5]} = $6;
push @Process,\%processes;
}
$nextline = <INFILE>;
}
last;
}
}###End of while loop###
答案 0 :(得分:1)
re.split
应该为您完成:
temp = re.split('[^\w\[\]\/\-:]+', line)
在你的情况下,你应该更好地编译re然后使用编译版本:
re_line = re.compile('[^\w\[\]\/\-:]+')
while(line.find('APmgr info: apmgrinfo -a') == -1):
temp = re_line.split(line, 6)
print temp
line = logfile.next()
当然,您应该优化正则表达式本身以更好地匹配您的问题。我的仅基于你的例子。
答案 1 :(得分:1)
您的perl代码可以大大简化。
没有必要使用正则表达式来捕获您的数据,因为它只是空白区域。因此split
可以更干净地完成你想要的一切。唯一的技巧是识别您只需要6个值,因此需要限制分割数据的次数:
use strict;
use warnings;
my @header;
my @processes;
while (<DATA>) {
chomp;
next if ! /^\s*\w/;
if (! @header) {
@header = split ' ';
} else {
my @data = split ' ', $_, 6;
my %hash;
@hash{@header} = @data;
push @processes, \%hash;
}
}
use Data::Dump;
dd \@processes;
__DATA__
====================================================================
===== Usage
====================================================================
--------------------------------------------------------------------
----- Processes:
--------------------------------------------------------------------
PID PPID USER VSZ STAT COMMAND
1 0 admin 812 S init
2 0 admin 0 SW [kthreadd]
3 2 admin 0 SW [migration/0]
4 2 admin 0 SW [ksoftirqd/0]
5 2 admin 0 SW [watchdog/0]
146 1 admin 712 S /usr/sbin/in.tftpd -l -u nobody -s /etc/airespi
3442 1 admin 4640 S N /usr/sbin/snmpd udp:161,udp6:161 -a -p /var/run
输出:
[
{
COMMAND => "init ",
PID => 1,
PPID => 0,
STAT => "S",
USER => "admin",
VSZ => 812,
},
{
COMMAND => "[kthreadd]",
PID => 2,
PPID => 0,
STAT => "SW",
USER => "admin",
VSZ => 0,
},
{
COMMAND => "[migration/0]",
PID => 3,
PPID => 2,
STAT => "SW",
USER => "admin",
VSZ => 0,
},
{
COMMAND => "[ksoftirqd/0]",
PID => 4,
PPID => 2,
STAT => "SW",
USER => "admin",
VSZ => 0,
},
{
COMMAND => "[watchdog/0]",
PID => 5,
PPID => 2,
STAT => "SW",
USER => "admin",
VSZ => 0,
},
{
COMMAND => "/usr/sbin/in.tftpd -l -u nobody -s /etc/airespi",
PID => 146,
PPID => 1,
STAT => "S",
USER => "admin",
VSZ => 712,
},
{
COMMAND => "N /usr/sbin/snmpd udp:161,udp6:161 -a -p /var/run",
PID => 3442,
PPID => 1,
STAT => "S",
USER => "admin",
VSZ => 4640,
},
]