python在<br/>和<br/>之间替换url的内容

时间:2019-05-08 04:03:35

标签: python python-3.x

有这样一个字符串:

<p>Millions of people watch TV.</p><br/>https://sites.google.com/aaa-net.bb.cc/be-do-have/%E3%83%9B%E3%83%BC%E3%83%A0<br/><p>Good boy!</p><br/>

我要删除内容:

https://sites.google.com/aaa-net.bb.cc/be-do-have/%E3%83%9B%E3%83%BC%E3%83%A0

只要保持:

<p>Millions of people watch TV.</p><br/><br/><p>Good boy!</p><br/>

我的代码:

mystring = '<p>Millions of people watch TV.</p><br/>https://sites.google.com/aaa-net.bb.cc/be-do-have/%E3%83%9B%E3%83%BC%E3%83%A0<br/><p>Good boy!</p><br/>'

如何做到?

4 个答案:

答案 0 :(得分:1)

您可以在正则表达式模块中使用re.sub

import re
mystring = '<p>Millions of people watch TV.</p><br/>https://sites.google.com/aaa-net.bb.cc/be-do-have/%E3%83%9B%E3%83%BC%E3%83%A0<br/><p>Good boy!</p><br/>'
print(re.sub(r'http[^<]+', '', mystring))

输出:

<p>Millions of people watch TV.</p><br/><br/><p>Good boy!</p><br/>

答案 1 :(得分:0)

您可以使用正则表达式替换:

查找:<br/>https?://[^<]*</br>

替换:<br/></br>

答案 2 :(得分:0)

mystring = '<p>Millions of people watch TV.</p><br/>https://sites.google.com/aaa-net.bb.cc/be-do-have/%E3%83%9B%E3%83%BC%E3%83%A0<br/><p>Good boy!</p><br/>'
# remove 'https://sites.google.com/aaa-net.bb.cc/be-do-have/%E3%83%9B%E3%83%BC%E3%83%A0'
resultstring = '<p>Millions of people watch TV.</p><br/><br/><p>Good boy!</p><br/>'

length = len(mystring)
startPos = -1
endPos = -1
for i in range(length):
    subString = mystring[i:]
    if subString.startswith('<br/>'):
        if(startPos == -1):
            startPos = i
            continue # check from next character to get endPos

        if(endPos == -1):
            endPos = i


firstSubString = mystring[:startPos + 5] # 5 = the characher size of '<br/>'
lastSubString = mystring[endPos:]


completeResult = firstSubString + lastSubString
print(completeResult, completeResult == resultstring)
print(completeResult, resultstring)

答案 3 :(得分:0)

import re

mystring = '<p>Millions of people watch TV.</p><br/>https://sites.google.com/aaa- 
net.bb.cc/be-do-have/%E3%83%9B%E3%83%BC%E3%83%A0<br/><p>Good boy!</p><br/>'
print(re.sub("(?:<br/>https)([\s\S]*?)(?=<br/>)",'<br/>',mystring))

输出:

<p>Millions of people watch TV.</p><br/><br/><p>Good boy!</p><br/>