正则表达式匹配数字并将它们重定向到不同的输出文件

时间:2016-03-23 15:55:43

标签: python regex shell awk sed

这是我当前的日志文件

event_1.log

a   0   3.2 1024    1   0   0
a   0   6.4 2048    2   0   0
le  0   9.6 2048    2   0   0
a   0   12.8    2048    2   0   0
le  0   12.8    2048    2   0   0
ll  0   19.6    2048    2   0   0
a   1   19.6    1024    1   0   0
a   1   22.4    3072    3   0   0
d   0   19.2    2048    2   0   0
le  1   22.4    2048    2   0   0
ll  1   22.8    2048    2   0   0
d   1   22.8    1024    1   0   0
a   0   26  2048    2   0   0

基于第二列,我需要创建文件名{second_column} .log。我只需要提取有的行 first_column等于a或d。其他(le和ll)应该被跳过。

以下是我的预期产出

0.log

a   0   3.2 1024    1   0   0
a   0   6.4 2048    2   0   0
a   0   12.8    2048    2   0   0
d   0   19.2    2048    2   0   0
a   0   26  2048    2   0   0

1.登录

a   1   19.6    1024    1   0   0
a   1   22.4    3072    3   0   0
d   1   22.8    1024    1   0   0

这是我试过的,但我显然是正则表达式的新手。我对其他解决方案(shell,sed,awk等)持开放态度。

import re

input_file = open("event_1.log", "r")
output_file = open("column2.log", "w") # want this to be the name of the 2nd column  
for line in input_file:
    match_defines = re.match(r'\s*([a-z]+) ([0-9]+) ([0-9]+) ([0-9]+) ([0-9]+) ([0-9]+) ([0-9]+)', line)

    if match_defines.group(1) == 'a':
        newline1= "\ndef %s():\n    return %s" % (match_defines.group(1),match_defines.group(2))
        output_file.write(newline1)

    else:
        output_file.write(line)

非常感谢任何帮助。感谢

3 个答案:

答案 0 :(得分:2)

您可以pandas使用regex来检查第一列中的ad

import pandas as pd

df = pd.read_table("event_1.log", header=None)

df[df[1]==1 & (df[0].str.contains('^(a|d)$'))].to_csv('1.log')

#    0  1     2     3  4  5  6
#6   a  1  19.6  1024  1  0  0
#7   a  1  22.4  3072  3  0  0
#11  d  1  22.8  1024  1  0  0

df[(df[1]==0) & (df[0].str.contains('^(a|d)$'))].to_csv('0.log')

#Out[99]:
#    0  1     2     3  4  5  6
#0   a  0   3.2  1024  1  0  0
#1   a  0   6.4  2048  2  0  0
#3   a  0  12.8  2048  2  0  0
#8   d  0  19.2  2048  2  0  0
#12  a  0  26.0  2048  2  0  0

数据:

In [113]: df
Out[113]:
     0  1     2     3  4  5  6
0    a  0   3.2  1024  1  0  0
1    a  0   6.4  2048  2  0  0
2   le  0   9.6  2048  2  0  0
3    a  0  12.8  2048  2  0  0
4   le  0  12.8  2048  2  0  0
5   ll  0  19.6  2048  2  0  0
6    a  1  19.6  1024  1  0  0
7    a  1  22.4  3072  3  0  0
8    d  0  19.2  2048  2  0  0
9   le  1  22.4  2048  2  0  0
10  ll  1  22.8  2048  2  0  0
11   d  1  22.8  1024  1  0  0
12   a  0  26.0  2048  2  0  0

答案 1 :(得分:2)

cat event_1.log | grep "^[a|d]" | while read l; do i=`echo $l |awk '{print $2}'`; echo $l >> ${i}.log; done

答案 2 :(得分:1)

import re
with open("event_1.log") as f,open("0.log","w+") as f0, open("1.log","w+") as f1:
    for line in f:
        result1=re.findall(r"^[ad]\s+0\s.*$",line,re.S)
        result2=re.findall(r"^[ad]\s+1\s.*$",line,re.S)
        if result1:
            f0.writelines(result1)
        if result2:
            print(result2)
            f1.writelines(result2)

或:

with open("event_1.log") as f,open("0.log","w+") as f0, open("1.log","w+") as f1:
    for line in f:
        sl=line.split()
        result0=sl[1]=="0" and (sl[0]=="a" or sl[0]=="d")
        result1=sl[1]=="1" and (sl[0]=="a" or sl[0]=="d")
        if result0:
            f0.write(line)
            print(line)
        if result1:
            f1.write(line)

列表理解:

with open("event_1.log") as f,open("0.log","w+") as f0, open("1.log","w+") as f1:
    isad=lambda x:x.split()[0] in ['a','d'] and x.split()[1] in ['0','1']
    [f0.write(r) if r.split()[1]=='0' else f1.write(r) for r in f if isad(r)]