Question

我有以下日志

01-01-2012 01:13:36 Blah blah : blah CustomerId:1234 downloaded Blah Size:5432 bytes Carrier:Company-A   
01-01-2012 01:13:36 Blah blah : blah CustomerId:1237 downloaded Blah Size:5432 bytes Carrier:Company-B

有人可以告诉我一个正则表达式来提取客户ID和大小并保存在列表中并打印每个客户ID下载的数据量吗？我能够使用Python中的搜索和词典来完成此操作。请求你们使用正则表达式。

Answer 1

#!/usr/bin/python

import re

res = dict()

data = open("log.txt").readlines()

for line in data:
    m = re.search("CustomerId:([0-9]+).*Size:([0-9]+)", line)
    cid = int(m.group(1))
    siz = int(m.group(2))
    if not res.has_key(cid):
        res[cid] = 0
    res[cid] += siz

for cust in res.keys():
    print "Customer ID %d - %d bytes" % (cust, res[cust])

Answer 2

对于此示例，我在data.txt输入测试文件中使用了两行粘贴的输入数据：

的Python：

import re

data = {}
regex = re.compile(r'CustomerId:(\d+).*?Size:(\d+)');

with open('data.txt') as fh:
    for line in fh:
        m = regex.search(line)

        if (m.group(1) and m.group(2)):

            cust = m.group(1)
            size = m.group(2)

            try:
                data[cust] += int(size) 
            except KeyError:
                data[cust] = int(size)

print(str(data))

输出：

{'1234': 16296, '1237': 16296}

的Perl：

use warnings;
use strict;

use Data::Dumper;

open my $fh, '<', 'data.txt' or die $!;

my %data;

while (my $line = <$fh>){
    if (my ($cust, $size) = $line =~ /CustomerId:(\d+).*?Size:(\d+)/){
        $data{$cust} += $size;
    }
}

print Dumper \%data;

输出：

$VAR1 = {
      '1234' => 16296,
      '1237' => 16296
};

Answer 3

以下是我要做的事情：

In [1]: import collections, re

In [2]: d = collections.defaultdict(list)

In [3]: string = "01-01-2012 01:13:36 Blah blah : blah CustomerId:1234 downloaded Blah Size:5432 bytes Carrier:Company-A\n01-01-2012 01:13:36 Blah blah : blah CustomerId:1237 downloaded Bla
    ...: h Size:5432 bytes Carrier:Company-B"

In [4]: for cust_id, sz in re.findall(r".*CustomerId\:(\d+).*Size:(\d+)", string):
    ...:     d[cust_id].append(sz)
    ...:

In [5]: d
Out[5]: defaultdict(list, {'1234': ['5432'], '1237': ['5432']})

正则表达式只提取Customerid和Data（字节）并保存在列表中？

3 个答案: