Question

我有一个长字符串，其中包含标签var getData = $.getJSON('/value'); getData.done(function (result) { var zeit = result.zeit; var flow = result.flow; var ctx = document.getElementById("myChart"); var myChart = new Chart(ctx, { type: 'line', data: { labels: zeit, datasets: [ { data: flow, label: "Flow>", borderColor: "#0004cd", fill: false }, ] } }); });和属性img，但是现在我想使用正则表达式删除src中的一些字符串。

我曾尝试按照以下代码进行操作，但我认为src中存在一些错误。

pattern

第一次打印，得到结果：

#!/usr/bin/env python
#encoding: utf-8
import re
url = "<p><img src ='https://xxx.cn/20190504195124718.png?x-oss-process=image/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L2gzNTYzNjM=,size_16,color_FFFFFF,t_70'></img></p><p><img src ='https://xxxx.cn/20190504195124718.png?x-oss-process=image/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L2gzNTYzNjM=,size_16,color_FFFFFF,t_70'></img></p>"

pattern = re.compile(r"https://img-.*(\?x-oss-process.*t_70)")

print(pattern.findall(url))

out = re.sub(pattern, '', url)

print(out)

第二次打印，得到结果：

['?x-oss-process=image/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L2gzNTYzNjM=,size_16,color_FFFFFF,t_70']

我想获取<img src =''></img> img删除字符串src的新字符串，只有“ https://xxx.cn/20190504195124718.png”。

就像：

?x-oss-process=image/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L2gzNTYzNjM=,size_16,color_FFFFFF,t_70

如何编写url = "<img src ='https://xxx.cn/20190504195124718.png'></img><img src ='https://xxxx.cn/20190504195124718.png'></img>"？

非常感谢〜

Answer 1

编辑后还添加了第二张img

在某些情况下，我发现regex有点复杂，Python具有强大的功能。因此，对于上述情况，我将使用以下代码：-

url = "<p><img src ='https://xxx.cn/20190504195124718.png?x-oss-process=image/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L2gzNTYzNjM=,size_16,color_FFFFFF,t_70'></img></p><img src ='https://xxxx.cn/20190504195124718.png?x-oss-process=image/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L2gzNTYzNjM=,size_16,color_FFFFFF,t_70'></img></p>"

new_url = url.split('?')[0] +"></img><img" + url.split('?')[1].split('<img')[-1] +"\'</img></p>"

print(new_url)

它将在“？”处分割网址，我们将获取第一项，然后将剩余的html添加到其中。希望能帮助到你。和平！！！

Answer 2

由于您需要替换字符串，因此我们将使用捕获组 (?#...)

output = re.sub("(?#<img.*)\?x-oss-process.*?t_70",'',url)

已添加？在t_70之前进行非贪婪匹配，它将捕获多个img标签。

来自文档

（？＃...）
一条评论;括号中的内容将被忽略。

在[此处]（https://docs.python.org/2/library/re.html）中查看文档

Answer 3

您可以为此使用美丽

from bs4 import BeautifulSoup
url = "<p><img src ='https://xxx.cn/20190504195124718.png?x-oss-process=image/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L2gzNTYzNjM=,size_16,color_FFFFFF,t_70'></img></p><p><img src ='https://xxxx.cn/20190504195124718.png?x-oss-process=image/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L2gzNTYzNjM=,size_16,color_FFFFFF,t_70'></img></p>"

#Parse the html
soup = BeautifulSoup(url, 'html.parser')

#Get all img tags
li = [tag.attrs['src'] for tag in soup.find_all() if tag.name == 'img']

#Iterate through tags and replace urls
for item in li:
  original_src = item
  new_src = item.split('?')[0]
  url = url.replace(original_src, new_src)

print(url)

输出将为

<p><img src ='https://xxx.cn/20190504195124718.png'></img></p>
<p><img src ='https://xxxx.cn/20190504195124718.png'></img></p>

如何用python替换字符串中的字符串？

3 个答案: