正则表达式匹配完全匹配不是python中的所有匹配

时间:2018-06-27 15:50:54

标签: python regex string

嗨,我有一个字符串为http://www.yifysubtitles.com/subtitles/blockers2018720pwebripx264-ytsam-arabic-128849"><span class="text-muted">subtitle</span> Blockers.2018.720p.WEBRip.x264-[YTS.AM]</a></td><td class="other-cell"></td><td class="uploader-cell"><a href="/user/SHINAWY">SHINAWY</a></td><td class="download-cell"><a href="/subtitles/blockers-arabic-yify-128849" class="subtitle-download" >download</a></td></tr><tr data-id="128835"><td class="rating-cell"><span class="label">0</span></td><td class="flag-cell"><span class="flag flag-cn"></span><span class="sub-lang">Chinese</span></td><td><a href="/subtitles/blockers2018720pblurayx264-ytsmecht-chinese-128835"><span class="text-muted">subtitle</span> Blockers.2018.720p.BluRay.x264-[YTS.ME].cht </a></td><td class="other-cell"></td><td class="uploader-cell"><a href="/user/osamawang">osamawang</a></td><td class="download-cell"><a href="/subtitles/blockers-chinese-yify-128835" class="subtitle-download" >download</a></td></tr><tr data-id="128543" class="high-rating"><td class="rating-cell"><span class="label label-success">6</span></td><td class="flag-cell"><span class="flag flag-gb"></span><span class="sub-lang">English</span></td><td><a href="/subtitles/blockers2018web-dlx264-fgt-english-128543"><span class="text-muted">subtitle</span> Blockers.2018.WEB-DL.x264-FGT</a></td><td class="other-cell"></td><td class="uploader-cell"><a href="/user/sub">sub</a></td><td class="download-cell"><a href="/subtitles/blockers-english-yify-128543" class="subtitle-download" >download</a></td></tr><tr data-id="128633"><td class="rating-cell"><span class="label">0</span></td><td class="flag-cell"><span class="flag flag-rs"></span><span class="sub-lang">Serbian</span></td><td><a href="/subtitles/blockers2018720pblurayx264ytsag-serbian-128633"><span class="text-muted">subtitle</span> Blockers.2018.720p.BluRay.x264.[YTS.AG]</a></td><td class="other-cell"></td><td class="uploader-cell"><a href="/user/TesneGace">TesneGace</a></td><td class="download-cell"><a href="/subtitles/blockers-serbian-yify-128633" class="subtitle-download" >download</a></td></tr><tr data-id="128702"><td class="rating-cell"><span class="label label-success">2</span></td><td class="flag-cell"><span class="flag flag-es"></span><span class="sub-lang">Spanish</span></td><td><a href="/subtitles/blockers2018720pblurayx264ytsag-spanish-128702"><span class="text-muted">subtitle</span> Blockers.2018.720p.BluRay.x264.[YTS.AG]</a></td><td class="other-cell"></td><td class="uploader-cell"><a href="/subtitles/blockers-english-yify-128543

并且我正在尝试匹配英语化"/subtitles/blockers-english-yify-128543

的首次出现

我的模式是re.search(r'/subtitles/.+\-english\-yify-\d+',text)

但是我的代码返回了整个字符串,请帮助

我的正则表达式可用here

1 个答案:

答案 0 :(得分:-1)

您的字符串实际上是html-您应该改用html解析器。我建议使用出色的lxml.html解析器。

要回答您的问题,默认情况下,正则表达式是贪婪的,这意味着您的.+部分将尽可能多地捕获满足条件的字符。因此,您将获得第一个/subtitles/和最后一个-english\-yify-以及两者之间的所有内容。