有这样一个字符串:
<p>Millions of people watch TV.</p><br/>https://sites.google.com/aaa-net.bb.cc/be-do-have/%E3%83%9B%E3%83%BC%E3%83%A0<br/><p>Good boy!</p><br/>
我要删除内容:
https://sites.google.com/aaa-net.bb.cc/be-do-have/%E3%83%9B%E3%83%BC%E3%83%A0
只要保持:
<p>Millions of people watch TV.</p><br/><br/><p>Good boy!</p><br/>
我的代码:
mystring = '<p>Millions of people watch TV.</p><br/>https://sites.google.com/aaa-net.bb.cc/be-do-have/%E3%83%9B%E3%83%BC%E3%83%A0<br/><p>Good boy!</p><br/>'
如何做到?
答案 0 :(得分:1)
您可以在正则表达式模块中使用re.sub
:
import re
mystring = '<p>Millions of people watch TV.</p><br/>https://sites.google.com/aaa-net.bb.cc/be-do-have/%E3%83%9B%E3%83%BC%E3%83%A0<br/><p>Good boy!</p><br/>'
print(re.sub(r'http[^<]+', '', mystring))
输出:
<p>Millions of people watch TV.</p><br/><br/><p>Good boy!</p><br/>
答案 1 :(得分:0)
您可以使用正则表达式替换:
查找:<br/>https?://[^<]*</br>
替换:<br/></br>
答案 2 :(得分:0)
mystring = '<p>Millions of people watch TV.</p><br/>https://sites.google.com/aaa-net.bb.cc/be-do-have/%E3%83%9B%E3%83%BC%E3%83%A0<br/><p>Good boy!</p><br/>'
# remove 'https://sites.google.com/aaa-net.bb.cc/be-do-have/%E3%83%9B%E3%83%BC%E3%83%A0'
resultstring = '<p>Millions of people watch TV.</p><br/><br/><p>Good boy!</p><br/>'
length = len(mystring)
startPos = -1
endPos = -1
for i in range(length):
subString = mystring[i:]
if subString.startswith('<br/>'):
if(startPos == -1):
startPos = i
continue # check from next character to get endPos
if(endPos == -1):
endPos = i
firstSubString = mystring[:startPos + 5] # 5 = the characher size of '<br/>'
lastSubString = mystring[endPos:]
completeResult = firstSubString + lastSubString
print(completeResult, completeResult == resultstring)
print(completeResult, resultstring)
答案 3 :(得分:0)
import re
mystring = '<p>Millions of people watch TV.</p><br/>https://sites.google.com/aaa-
net.bb.cc/be-do-have/%E3%83%9B%E3%83%BC%E3%83%A0<br/><p>Good boy!</p><br/>'
print(re.sub("(?:<br/>https)([\s\S]*?)(?=<br/>)",'<br/>',mystring))
输出:
<p>Millions of people watch TV.</p><br/><br/><p>Good boy!</p><br/>