我有以下日志
01-01-2012 01:13:36 Blah blah : blah CustomerId:1234 downloaded Blah Size:5432 bytes Carrier:Company-A
01-01-2012 01:13:36 Blah blah : blah CustomerId:1237 downloaded Blah Size:5432 bytes Carrier:Company-B
有人可以告诉我一个正则表达式来提取客户ID和大小并保存在列表中并打印每个客户ID下载的数据量吗?我能够使用Python中的搜索和词典来完成此操作。请求你们使用正则表达式。
答案 0 :(得分:3)
#!/usr/bin/python
import re
res = dict()
data = open("log.txt").readlines()
for line in data:
m = re.search("CustomerId:([0-9]+).*Size:([0-9]+)", line)
cid = int(m.group(1))
siz = int(m.group(2))
if not res.has_key(cid):
res[cid] = 0
res[cid] += siz
for cust in res.keys():
print "Customer ID %d - %d bytes" % (cust, res[cust])
答案 1 :(得分:1)
对于此示例,我在data.txt
输入测试文件中使用了两行粘贴的输入数据:
的Python:
import re
data = {}
regex = re.compile(r'CustomerId:(\d+).*?Size:(\d+)');
with open('data.txt') as fh:
for line in fh:
m = regex.search(line)
if (m.group(1) and m.group(2)):
cust = m.group(1)
size = m.group(2)
try:
data[cust] += int(size)
except KeyError:
data[cust] = int(size)
print(str(data))
输出:
{'1234': 16296, '1237': 16296}
的Perl:
use warnings;
use strict;
use Data::Dumper;
open my $fh, '<', 'data.txt' or die $!;
my %data;
while (my $line = <$fh>){
if (my ($cust, $size) = $line =~ /CustomerId:(\d+).*?Size:(\d+)/){
$data{$cust} += $size;
}
}
print Dumper \%data;
输出:
$VAR1 = {
'1234' => 16296,
'1237' => 16296
};
答案 2 :(得分:0)
以下是我要做的事情:
In [1]: import collections, re
In [2]: d = collections.defaultdict(list)
In [3]: string = "01-01-2012 01:13:36 Blah blah : blah CustomerId:1234 downloaded Blah Size:5432 bytes Carrier:Company-A\n01-01-2012 01:13:36 Blah blah : blah CustomerId:1237 downloaded Bla
...: h Size:5432 bytes Carrier:Company-B"
In [4]: for cust_id, sz in re.findall(r".*CustomerId\:(\d+).*Size:(\d+)", string):
...: d[cust_id].append(sz)
...:
In [5]: d
Out[5]: defaultdict(list, {'1234': ['5432'], '1237': ['5432']})