我试图从字符串列表中提取网址。样本清单:
import re
p = ['<img class="alignnone size-full wp-image-2087" src="http://www.sample.com/test.jpg" alt="0wCR41v" width="540" height="720" srcset="http://www.sample.com/test-225x300.jpg 225w, http://www.sample.com/test.jpg 540w" sizes="(max-width: 540px) 100vw, 540px" />', '<img class="alignnone size-large wp-image-2133" src="http://www.sample.com/test2.jpg" alt="NtAboHF" width="583" height="1024" srcset="http://www.happyfridaygents.com/wp-content/uploads/2016/04/NtAboHF-768x1349.jpg 768w, http://www.sample.com/test2.jpg 583w, http://www.happyfridaygents.com/wp-content/uploads/2016/04/NtAboHF.jpg 828w" sizes="(max-width: 583px) 100vw, 583px" />']
我想提取src =&#34之后的http://www.sample.com/test.jpg
部分。一部分。
如果p只是一个字符串,我可以使用findall:
t = re.findall('src="(.+)" alt', p)
print t
但是我如何迭代列表并返回P中所有网址的列表?
答案 0 :(得分:0)
这是否符合您的要求?
import re
p = ['<img class="alignnone size-full wp-image-2087" src="http://www.sample.com/test.jpg" alt="0wCR41v" width="540" height="720" srcset="http://www.sample.com/test-225x300.jpg 225w, http://www.sample.com/test.jpg 540w" sizes="(max-width: 540px) 100vw, 540px" />', '<img class="alignnone size-large wp-image-2133" src="http://www.sample.com/test2.jpg" alt="NtAboHF" width="583" height="1024" srcset="http://www.happyfridaygents.com/wp-content/uploads/2016/04/NtAboHF-768x1349.jpg 768w, http://www.sample.com/test2.jpg 583w, http://www.happyfridaygents.com/wp-content/uploads/2016/04/NtAboHF.jpg 828w" sizes="(max-width: 583px) 100vw, 583px" />']
outList = [re.findall('src="(.+)" alt', pp)[0] for pp in p]
答案 1 :(得分:0)
使用列表理解:
$scope.filterEvents = function(item) {
return item.StartDate !== null || item.EndDate !== null;
}
这将为您提供可以链接在一起的列表列表:
<ul>
<li ng-repeat="item in damageEvants | filter:filterEvents">{{item.id}}</li>
</ul>
其他地方的正则表达式答案更优雅。
答案 2 :(得分:0)
如何在循环中完成:
>>> pe = re.compile('src="(.+)" alt')
>>> for img in p:
... print pe.findall(img)
...
['http://www.sample.com/test.jpg']
['http://www.sample.com/test2.jpg']
答案 3 :(得分:0)
for i in p:
t = re.findall('src="(.+)" alt', i)
print t
更新
k=[re.findall('src="(.+)" alt',i) for i in p]
[item for sublist in k for item in sublist]
['http://www.sample.com/test.jpg','http://www.sample.com/test2.jpg']
答案 4 :(得分:0)
这是使用BeautifulSoup
的解决方案:
>>> p = ['<img class="alignnone size-full wp-image-2087" src="http://www.sample.com/test.jpg" alt="0wCR41v" width="540" height="720" srcset="http://www.sample.com/test-225x300.jpg 225w, http://www.sample.com/test.jpg 540w" sizes="(max-width: 540px) 100vw, 540px" />', '<img class="alignnone size-large wp-image-2133" src="http://www.sample.com/test2.jpg" alt="NtAboHF" width="583" height="1024" srcset="http://www.happyfridaygents.com/wp-content/uploads/2016/04/NtAboHF-768x1349.jpg 768w, http://www.sample.com/test2.jpg 583w, http://www.happyfridaygents.com/wp-content/uploads/2016/04/NtAboHF.jpg 828w" sizes="(max-width: 583px) 100vw, 583px" />']
>>> from bs4 import BeautifulSoup
>>> soup = BeautifulSoup(''.join(p), 'html.parser')
>>> src_links = [img['src'] for img in soup.find_all('img')]
>>> src_links
[u'http://www.sample.com/test.jpg', u'http://www.sample.com/test2.jpg']
如果您确实想使用正则表达式:
>>> regex = re.compile(r'src="(.+)" alt')
>>> [regex.search(img).group(1) for img in p]
['http://www.sample.com/test.jpg', 'http://www.sample.com/test2.jpg']