我有一个从链接中抓取数据的脚本。我有下面的结果集:
<p class="flag"><img alt="Paris" src="/images/flags/FR.gif"/></p>
<p class="flag"><img alt="Austria" src="/images/flags/AT.gif"/></p>
<p class="flag"><img alt="Switzerland" src="/images/flags/CH.gif"/></p>
<p class="flag"><img alt="Malta" src="/images/flags/MT.gif"/></p>
<p class="flag"><img alt="Sydney" src="/images/flags/AU.gif"/></p>
<p class="flag"><img alt="Rotterdam" src="/images/flags/NL.gif"/></p>
<p class="flag"><img alt="London" src="/images/flags/UK.gif"/></p>
<p class="flag"><img alt="London" src="/images/flags/UK.gif"/></p>
<p class="flag"><img alt="West + Wales" src="/images/flags/UK.gif"/></p>
<p class="flag"><img alt="Melbourne" src="/images/flags/AU.gif"/></p>
<p class="flag"><img alt="London" src="/images/flags/UK.gif"/></p>
<p class="flag"><img alt="Bulgaria" src="/images/flags/BG.gif"/></p>
<p class="flag"><img alt="Amsterdam" src="/images/flags/NL.gif"/></p>
<p class="flag"><img alt="Scotland" src="/images/flags/UK.gif"/></p>
<p class="flag"><img alt="Midlands" src="/images/flags/UK.gif"/></p>
&#13;
如何仅保留字符串/文本以下内容:
答案 0 :(得分:0)
找到所有img
元素,其中包含&#34;标记&#34;在src
内部属性,位于p
元素内class="flag"
,然后从src
属性值中提取语言值:
import re
from bs4 import BeautifulSoup
pattern = re.compile(r"/(\w+)\.gif$")
for img in soup.select("p.flag img[src*=flags]"):
match = pattern.search(img["src"])
if match:
print(match.group(1))
答案 1 :(得分:-1)
select