如何删除包含特定字符串但字符串内部长度不同的字符串?

时间:2017-02-19 11:01:08

标签: python python-2.7 python-3.x

我有一个日志文件,我想删除一些特定的部分。以下显示了日志文件的一部分:

I0216 10:18:04.720626 31559 solver.cpp:273] Solving 
I0216 10:18:04.720630 31559 solver.cpp:274] Learning Rate Policy: step
I0216 10:18:05.242708 31559 solver.cpp:219] Iteration 0 (0 iter/s, 0.522037s/50 iters), loss = 1.60944
I0216 10:18:05.242750 31559 solver.cpp:238]     Train net output #0: accuracy = 0
I0216 10:18:05.242763 31559 solver.cpp:238]     Train net output #1: loss = 1.60944 (* 1 = 1.60944 loss)
I0216 10:18:05.242785 31559 sgd_solver.cpp:105] Iteration 0, lr = 1e-10
I0216 10:18:22.386440 31559 solver.cpp:219] Iteration 50 (2.91648 iter/s, 17.144s/50 iters), loss = 1.60944
I0216 10:18:22.386497 31559 solver.cpp:238]     Train net output #0: accuracy = 0.643982
I0216 10:18:22.386509 31559 solver.cpp:238]     Train net output #1: loss = 1.60944 (* 1 = 1.60944 loss)
I0216 10:18:22.386515 31559 sgd_solver.cpp:105] Iteration 50, lr = 1e-10
I0216 10:18:39.549926 31559 solver.cpp:219] Iteration 100 (2.91313 iter/s, 17.1637s/50 iters), loss = 1.60944
I0216 10:18:39.550071 31559 solver.cpp:238]     Train net output #0: accuracy = 1
I0216 10:18:39.550087 31559 solver.cpp:238]     Train net output #1: loss = 1.60944 (* 1 = 1.60944 loss)
I0216 10:18:39.550093 31559 sgd_solver.cpp:105] Iteration 100, lr = 1e-10
I0216 10:18:56.714752 31559 solver.cpp:219] Iteration 150 (2.91292 iter/s, 17.1649s/50 iters), loss = 1.60944
I0216 10:18:56.714824 31559 solver.cpp:238]     Train net output #0: accuracy = 0.624222
I0216 10:18:56.714838 31559 solver.cpp:238]     Train net output #1: loss = 1.60944 (* 1 = 1.60944 loss)
I0216 10:18:56.714845 31559 sgd_solver.cpp:105] Iteration 150, lr = 1e-10
I0216 10:19:13.893241 31559 solver.cpp:219] Iteration 200 (2.91059 iter/s, 17.1787s/50 iters), loss = 1.60944
I0216 10:19:13.893450 31559 solver.cpp:238]     Train net output #0: accuracy = 1
I0216 10:19:13.893467 31559 solver.cpp:238]     Train net output #1: loss = 1.60944 (* 1 = 1.60944 loss)
I0216 10:19:13.893473 31559 sgd_solver.cpp:105] Iteration 200, lr = 1e-10
I0216 10:19:31.094591 31559 solver.cpp:219] Iteration 250 (2.90674 iter/s, 17.2014s/50 iters), loss = 1.60944
I0216 10:19:31.094650 31559 solver.cpp:238]     Train net output #0: accuracy = 0.61937
I0216 10:19:31.094662 31559 solver.cpp:238]     Train net output #1: loss = 1.60944 (* 1 = 1.60944 loss)
I0216 10:19:31.094667 31559 sgd_solver.cpp:105] Iteration 250, lr = 1e-10
I0216 10:19:48.290045 31559 solver.cpp:219] Iteration 300 (2.90772 iter/s, 17.1956s/50 iters), loss = 1.60944
I0216 10:19:48.290187 31559 solver.cpp:238]     Train net output #0: accuracy = 0.959229
I0216 10:19:48.290205 31559 solver.cpp:238]     Train net output #1: loss = 1.60944 (* 1 = 1.60944 loss)
I0216 10:19:48.290210 31559 sgd_solver.cpp:105] Iteration 300, lr = 1e-10
I0216 10:20:05.504201 31559 solver.cpp:219] Iteration 350 (2.90457 iter/s, 17.2142s/50 iters), loss = 1.60944
I0216 10:20:05.504257 31559 solver.cpp:238]     Train net output #0: accuracy = 0.772217
I0216 10:20:05.504271 31559 solver.cpp:238]     Train net output #1: loss = 1.60944 (* 1 = 1.60944 loss)

可以看出,有些行以31559 solver.cpp:219] Iteration

开头

我希望在不更改文件的其他行的情况下,仅更改这些行,例如,这一行:FROM

   ... solver.cpp:219] Iteration 14750 (2.9004 iter/s, 17.239s/50 iters), loss = 1.60934

... solver.cpp:219] Iteration 14750, loss = 1.60934
.
.
.

这意味着我想从包含上述行的行中删除子字符串(2.9004 iter/s, 17.239s/50 iters),但其他行保持不变。 谢谢

我想删除包含(2.8995 iter/s, 17.2444s/50 iters)的行中的那些部分,此字符串的长度可能彼此不同。此部分以(开头,并继续显示一个数字(可能与另一行不同,并继续iter/s,,再次以数字结尾,以iters)结尾。

AS @ delca85建议模式如下:

p = "(\(\d*[.]?\d* iter/s\,\s\d*[.]?\d*)(s/[0-9]+)?(\siters\))"

有人有建议吗?提前致谢

2 个答案:

答案 0 :(得分:1)

我对字符串的第二部分做了一个额外的假设,即它是一个s/number的数字。我希望我没有错,无论如何,在这种情况下,请告诉我,我很乐意为您找到另一种解决方案。

这是我的建议:

import re

string = "I0216 11:42:50.047427 31559 solver.cpp:219] Iteration 14750 (2.9004 iter/s, 17.239s/50 iters), loss = 1.60934 I0216 11:42:50.047472 31559 solver.cpp:238]     Train net output \#0: accuracy = 1\" "

p = "\(\d*[.]?\d* iter/s\, \d*[.]?\d*s/[0-9]+ iters\)"
pattern = re.compile(p)
for l in pattern.findall(string): 
    print l

我希望我能帮到你!

s / 50可选
如果s/50在字符串的第二部分中是可选的,则可以使用此解决方案:

import re

string = "I0216 11:42:50.047427 31559 solver.cpp:219] Iteration 14750 (2.9004 iter/s, 17.239s/50 iters), loss = 1.60934 I0216 11:42:50.047472 31559 solver.cpp:238]     Train net output \#0: accuracy = 1\" "
string = string + "I0216 11:42:50.047427 31559 solver.cpp:219] Iteration 14750 (2.9004 iter/s, 17.239 iters), loss = 1.60934 I0216 11:42:50.047472 31559 solver.cpp:238]     Train net output \#0: accuracy = 1\" " 
p = "(\(\d*[.]?\d* iter/s\,\s\d*[.]?\d*)(s/[0-9]+)?(\siters\))"
pattern = re.compile(p)
for l in pattern.findall(string): 
    print ''.join(l)

打开文件,读取行,匹配模式并替换文件中的行

import re

p = "(\(\d*[.]?\d* iter/s\,\s\d*[.]?\d*)(s/[0-9]+)?(\siters\))"
pattern = re.compile(p)
for line in fileinput.input("file.txt", inplace=1):
    for m in pattern.findall(line): 
        string = ''.join(m)
        if string in line:
            line = line.replace(string, "")
    sys.stdout.write(line)

答案 1 :(得分:0)

您可以使用正则表达式模块(称为“re”),它可以帮助您快速隔离子字符串。

以下是代码:

import re

file = open('your_file_with_correct_path')
file_content = file.read()

#The string you provided
#No need to do the below string definition as you will use the file_content
#str = '   I0216 11:42:50.047427 31559 solver.cpp:219] Iteration 14750 (2.9004 iter/s, 17.239s/50 iters), loss = 1.60934 I0216 11:42:50.047472 31559 solver.cpp:238] Train net output #0: accuracy = 1'

sub_tring = re.findall('\(\d+.*\)', file_content)

for element in sub_string:
    #add element to the file you want

#save the file where you added the elements

sub_string将是与findall方法的第一个参数所要求的模式匹配的所有子字符串的列表。

我建议您查看regex中使用的各种特殊字符,因为这对于清理字符串非常有用。

感谢。