我有一个字符串,必须从中删除时间戳记和标点符号。而且我还必须删除所有数字,但responseCode值 在这种情况下必须保持例如400。不论400到哪里,都不应将其删除。而且我想删除所有网址 和文件名以tar.gz结尾。
mystr="sun aug 19 13:02:09 2018 I_am.98189: hello please connect to the local host:8080
sun aug 19 13:02:10 2018 hey.94289: hello not able to find the file
sun aug 19 13:02:10 2018 I_am.94289: Base url for file_transfer is: abc/vd/filename.tar.gz
mon aug 19 13:02:10 2018 how_94289: $var1={
'responseCode' = '400',
'responseDate' = 'Sun, 19 Aug 2018 13:02:08 ET',
'responseContent' = 'ABC' }
mon aug 20 13:02:10 2018 hello!94289: Error performing action, failed with error code [400]
"
预期结果:
"I_am hello please connect to the local host
hello not able to find the file
Base url for file_transfer
var1
responseCode = 400
responseDate
responseContent = ABC
Error performing action, failed with error code 400
"
我删除标点符号的解决方案:
punctuations = '''!=()-[]{};:'"\,<>.?@#$%^&*_~'''
no_punct = ""
for char in mystr:
if char not in punctuations:
no_punct = no_punct + char
# display the unpunctuated string
print(no_punct)
答案 0 :(得分:1)
也许:
patterns = [r"\w{3} \w{3} \d{2} \d{2}:\d{2}:\d{2} \d{4}\s*", #sun aug 19 13:02:10 2018
r"\w{3}, \d{2} \w{3} \d{4} \d{2}:\d{2}:\d{2} \w{2}\s*", #Sun, 19 Aug 2018 13:02:08 ET
r":\s*([\da-zA_Z]+\/)+([a-zA-Z0-9\.]+)", #URL
r"([a-zA-Z_!]+)[\.!_]\d+:\s*", #word[._!]number:>=0space
r":\d+",
"[/':,${}\[\]]" #punctuations
]
s = mystr
for p in patterns:
s = re.sub(p,'', s)
s = s.strip()
print(s)
输出:
hello please connect to the local host
hello not able to find the file
Base url for file_transfer is
var1=
responseCode = 400
responseDate =
responseContent = ABC
Error performing action failed with error code 400