我有一个日志文件,我想删除一些特定的部分。以下显示了日志文件的一部分:
I0216 10:18:04.720626 31559 solver.cpp:273] Solving
I0216 10:18:04.720630 31559 solver.cpp:274] Learning Rate Policy: step
I0216 10:18:05.242708 31559 solver.cpp:219] Iteration 0 (0 iter/s, 0.522037s/50 iters), loss = 1.60944
I0216 10:18:05.242750 31559 solver.cpp:238] Train net output #0: accuracy = 0
I0216 10:18:05.242763 31559 solver.cpp:238] Train net output #1: loss = 1.60944 (* 1 = 1.60944 loss)
I0216 10:18:05.242785 31559 sgd_solver.cpp:105] Iteration 0, lr = 1e-10
I0216 10:18:22.386440 31559 solver.cpp:219] Iteration 50 (2.91648 iter/s, 17.144s/50 iters), loss = 1.60944
I0216 10:18:22.386497 31559 solver.cpp:238] Train net output #0: accuracy = 0.643982
I0216 10:18:22.386509 31559 solver.cpp:238] Train net output #1: loss = 1.60944 (* 1 = 1.60944 loss)
I0216 10:18:22.386515 31559 sgd_solver.cpp:105] Iteration 50, lr = 1e-10
I0216 10:18:39.549926 31559 solver.cpp:219] Iteration 100 (2.91313 iter/s, 17.1637s/50 iters), loss = 1.60944
I0216 10:18:39.550071 31559 solver.cpp:238] Train net output #0: accuracy = 1
I0216 10:18:39.550087 31559 solver.cpp:238] Train net output #1: loss = 1.60944 (* 1 = 1.60944 loss)
I0216 10:18:39.550093 31559 sgd_solver.cpp:105] Iteration 100, lr = 1e-10
I0216 10:18:56.714752 31559 solver.cpp:219] Iteration 150 (2.91292 iter/s, 17.1649s/50 iters), loss = 1.60944
I0216 10:18:56.714824 31559 solver.cpp:238] Train net output #0: accuracy = 0.624222
I0216 10:18:56.714838 31559 solver.cpp:238] Train net output #1: loss = 1.60944 (* 1 = 1.60944 loss)
I0216 10:18:56.714845 31559 sgd_solver.cpp:105] Iteration 150, lr = 1e-10
I0216 10:19:13.893241 31559 solver.cpp:219] Iteration 200 (2.91059 iter/s, 17.1787s/50 iters), loss = 1.60944
I0216 10:19:13.893450 31559 solver.cpp:238] Train net output #0: accuracy = 1
I0216 10:19:13.893467 31559 solver.cpp:238] Train net output #1: loss = 1.60944 (* 1 = 1.60944 loss)
I0216 10:19:13.893473 31559 sgd_solver.cpp:105] Iteration 200, lr = 1e-10
I0216 10:19:31.094591 31559 solver.cpp:219] Iteration 250 (2.90674 iter/s, 17.2014s/50 iters), loss = 1.60944
I0216 10:19:31.094650 31559 solver.cpp:238] Train net output #0: accuracy = 0.61937
I0216 10:19:31.094662 31559 solver.cpp:238] Train net output #1: loss = 1.60944 (* 1 = 1.60944 loss)
I0216 10:19:31.094667 31559 sgd_solver.cpp:105] Iteration 250, lr = 1e-10
I0216 10:19:48.290045 31559 solver.cpp:219] Iteration 300 (2.90772 iter/s, 17.1956s/50 iters), loss = 1.60944
I0216 10:19:48.290187 31559 solver.cpp:238] Train net output #0: accuracy = 0.959229
I0216 10:19:48.290205 31559 solver.cpp:238] Train net output #1: loss = 1.60944 (* 1 = 1.60944 loss)
I0216 10:19:48.290210 31559 sgd_solver.cpp:105] Iteration 300, lr = 1e-10
I0216 10:20:05.504201 31559 solver.cpp:219] Iteration 350 (2.90457 iter/s, 17.2142s/50 iters), loss = 1.60944
I0216 10:20:05.504257 31559 solver.cpp:238] Train net output #0: accuracy = 0.772217
I0216 10:20:05.504271 31559 solver.cpp:238] Train net output #1: loss = 1.60944 (* 1 = 1.60944 loss)
可以看出,有些行以31559 solver.cpp:219] Iteration
我希望在不更改文件的其他行的情况下,仅更改这些行,例如,这一行:FROM
... solver.cpp:219] Iteration 14750 (2.9004 iter/s, 17.239s/50 iters), loss = 1.60934
要
... solver.cpp:219] Iteration 14750, loss = 1.60934
.
.
.
这意味着我想从包含上述行的行中删除子字符串(2.9004 iter/s, 17.239s/50 iters)
,但其他行保持不变。
谢谢
我想删除包含(2.8995 iter/s, 17.2444s/50 iters)
的行中的那些部分,此字符串的长度可能彼此不同。此部分以(
开头,并继续显示一个数字(可能与另一行不同,并继续iter/s,
,再次以数字结尾,以iters)
结尾。
AS @ delca85建议模式如下:
p = "(\(\d*[.]?\d* iter/s\,\s\d*[.]?\d*)(s/[0-9]+)?(\siters\))"
有人有建议吗?提前致谢
答案 0 :(得分:1)
我对字符串的第二部分做了一个额外的假设,即它是一个s/number
的数字。我希望我没有错,无论如何,在这种情况下,请告诉我,我很乐意为您找到另一种解决方案。
这是我的建议:
import re
string = "I0216 11:42:50.047427 31559 solver.cpp:219] Iteration 14750 (2.9004 iter/s, 17.239s/50 iters), loss = 1.60934 I0216 11:42:50.047472 31559 solver.cpp:238] Train net output \#0: accuracy = 1\" "
p = "\(\d*[.]?\d* iter/s\, \d*[.]?\d*s/[0-9]+ iters\)"
pattern = re.compile(p)
for l in pattern.findall(string):
print l
我希望我能帮到你!
s / 50可选
如果s/50
在字符串的第二部分中是可选的,则可以使用此解决方案:
import re
string = "I0216 11:42:50.047427 31559 solver.cpp:219] Iteration 14750 (2.9004 iter/s, 17.239s/50 iters), loss = 1.60934 I0216 11:42:50.047472 31559 solver.cpp:238] Train net output \#0: accuracy = 1\" "
string = string + "I0216 11:42:50.047427 31559 solver.cpp:219] Iteration 14750 (2.9004 iter/s, 17.239 iters), loss = 1.60934 I0216 11:42:50.047472 31559 solver.cpp:238] Train net output \#0: accuracy = 1\" "
p = "(\(\d*[.]?\d* iter/s\,\s\d*[.]?\d*)(s/[0-9]+)?(\siters\))"
pattern = re.compile(p)
for l in pattern.findall(string):
print ''.join(l)
打开文件,读取行,匹配模式并替换文件中的行
import re
p = "(\(\d*[.]?\d* iter/s\,\s\d*[.]?\d*)(s/[0-9]+)?(\siters\))"
pattern = re.compile(p)
for line in fileinput.input("file.txt", inplace=1):
for m in pattern.findall(line):
string = ''.join(m)
if string in line:
line = line.replace(string, "")
sys.stdout.write(line)
答案 1 :(得分:0)
您可以使用正则表达式模块(称为“re”),它可以帮助您快速隔离子字符串。
以下是代码:
import re
file = open('your_file_with_correct_path')
file_content = file.read()
#The string you provided
#No need to do the below string definition as you will use the file_content
#str = ' I0216 11:42:50.047427 31559 solver.cpp:219] Iteration 14750 (2.9004 iter/s, 17.239s/50 iters), loss = 1.60934 I0216 11:42:50.047472 31559 solver.cpp:238] Train net output #0: accuracy = 1'
sub_tring = re.findall('\(\d+.*\)', file_content)
for element in sub_string:
#add element to the file you want
#save the file where you added the elements
sub_string将是与findall
方法的第一个参数所要求的模式匹配的所有子字符串的列表。
我建议您查看regex中使用的各种特殊字符,因为这对于清理字符串非常有用。
感谢。