这是我当前的日志文件
event_1.log
a 0 3.2 1024 1 0 0
a 0 6.4 2048 2 0 0
le 0 9.6 2048 2 0 0
a 0 12.8 2048 2 0 0
le 0 12.8 2048 2 0 0
ll 0 19.6 2048 2 0 0
a 1 19.6 1024 1 0 0
a 1 22.4 3072 3 0 0
d 0 19.2 2048 2 0 0
le 1 22.4 2048 2 0 0
ll 1 22.8 2048 2 0 0
d 1 22.8 1024 1 0 0
a 0 26 2048 2 0 0
基于第二列,我需要创建文件名{second_column} .log。我只需要提取有的行 first_column等于a或d。其他(le和ll)应该被跳过。
以下是我的预期产出
0.log
a 0 3.2 1024 1 0 0
a 0 6.4 2048 2 0 0
a 0 12.8 2048 2 0 0
d 0 19.2 2048 2 0 0
a 0 26 2048 2 0 0
1.登录
a 1 19.6 1024 1 0 0
a 1 22.4 3072 3 0 0
d 1 22.8 1024 1 0 0
这是我试过的,但我显然是正则表达式的新手。我对其他解决方案(shell,sed,awk等)持开放态度。
import re
input_file = open("event_1.log", "r")
output_file = open("column2.log", "w") # want this to be the name of the 2nd column
for line in input_file:
match_defines = re.match(r'\s*([a-z]+) ([0-9]+) ([0-9]+) ([0-9]+) ([0-9]+) ([0-9]+) ([0-9]+)', line)
if match_defines.group(1) == 'a':
newline1= "\ndef %s():\n return %s" % (match_defines.group(1),match_defines.group(2))
output_file.write(newline1)
else:
output_file.write(line)
非常感谢任何帮助。感谢
答案 0 :(得分:2)
您可以pandas
使用regex
来检查第一列中的a
或d
:
import pandas as pd
df = pd.read_table("event_1.log", header=None)
df[df[1]==1 & (df[0].str.contains('^(a|d)$'))].to_csv('1.log')
# 0 1 2 3 4 5 6
#6 a 1 19.6 1024 1 0 0
#7 a 1 22.4 3072 3 0 0
#11 d 1 22.8 1024 1 0 0
df[(df[1]==0) & (df[0].str.contains('^(a|d)$'))].to_csv('0.log')
#Out[99]:
# 0 1 2 3 4 5 6
#0 a 0 3.2 1024 1 0 0
#1 a 0 6.4 2048 2 0 0
#3 a 0 12.8 2048 2 0 0
#8 d 0 19.2 2048 2 0 0
#12 a 0 26.0 2048 2 0 0
数据:强>
In [113]: df
Out[113]:
0 1 2 3 4 5 6
0 a 0 3.2 1024 1 0 0
1 a 0 6.4 2048 2 0 0
2 le 0 9.6 2048 2 0 0
3 a 0 12.8 2048 2 0 0
4 le 0 12.8 2048 2 0 0
5 ll 0 19.6 2048 2 0 0
6 a 1 19.6 1024 1 0 0
7 a 1 22.4 3072 3 0 0
8 d 0 19.2 2048 2 0 0
9 le 1 22.4 2048 2 0 0
10 ll 1 22.8 2048 2 0 0
11 d 1 22.8 1024 1 0 0
12 a 0 26.0 2048 2 0 0
答案 1 :(得分:2)
cat event_1.log | grep "^[a|d]" | while read l; do i=`echo $l |awk '{print $2}'`; echo $l >> ${i}.log; done
答案 2 :(得分:1)
import re
with open("event_1.log") as f,open("0.log","w+") as f0, open("1.log","w+") as f1:
for line in f:
result1=re.findall(r"^[ad]\s+0\s.*$",line,re.S)
result2=re.findall(r"^[ad]\s+1\s.*$",line,re.S)
if result1:
f0.writelines(result1)
if result2:
print(result2)
f1.writelines(result2)
或:
with open("event_1.log") as f,open("0.log","w+") as f0, open("1.log","w+") as f1:
for line in f:
sl=line.split()
result0=sl[1]=="0" and (sl[0]=="a" or sl[0]=="d")
result1=sl[1]=="1" and (sl[0]=="a" or sl[0]=="d")
if result0:
f0.write(line)
print(line)
if result1:
f1.write(line)
列表理解:
with open("event_1.log") as f,open("0.log","w+") as f0, open("1.log","w+") as f1:
isad=lambda x:x.split()[0] in ['a','d'] and x.split()[1] in ['0','1']
[f0.write(r) if r.split()[1]=='0' else f1.write(r) for r in f if isad(r)]