获取src之间的链接

时间:2015-10-30 11:11:03

标签: python regex

我这里有数据

<div class="main-details mt10">
    <div class="container">
        <div class="row">
            <div class="col-lg-8 col-md-7" data-purpose="introduction">
                                    <div class="slp-jwplayer-communicator" data-fade-in="1"
                         data-playerhtml='            <iframe id="hh"
                    src="https://localhost/embed/video/E0cZc345xCVTXwT/?params%5Bvars%5D%5Bplaylist%5D%5B0%5D%5Bimage%5D=https%3A%2F%2Flocalhost.images.com%2Fckxit%2F750x422%2F469292_6c3e_5.jpg&params%5BtrackVideoPlay%5D=true"
                    width="100%"
                    height="100%"
                    frameborder="0" webkitallowfullscreen mozallowfullscreen allowfullscreen
                    style="background: black;">
            </iframe>
        '>
                        <div class="promo-asset-content stretchy-wrapper ud-courseimpressiontracker"
                             data-id="erew343423"
                             data-tracking-type="proms"
                            >
                            <div>
                                <img class="cth" src="https://lcoalhost/data/469292_6c3e_5.jpg"/>
                            </div>
                        </div>
                    </div>
                            </div>
            <div class="col-lg-4 col-md-5">
                <div class="row fxdc lf-wrap-md">
                    <div class="fxw-md -md db-xs">
                        <div class="right-top col-md-12 col-sm-6">

<div class="take-btn">
            <div class="price fxac">

                    </div>

            <a class="ct "
       data-requireLogin="true"
       data-les="button-enroll-b"
       data-padding="0"
       data-passDtCode="true"
       data-purpose="take-this"
       href="https://localhost/code=kKp5D213TWOo">
        Take </a>

我想找到jwplayer并在src

之间获取所有内容
jwplayer-communicator" data-fade-in="1"
data-playerhtml=' <iframe id="4222780"
src="https://localhost/embed/video/E0cZc345xCVTXwT/?params%5Bvars%5D%5Bplaylist%5D%5B0%5D%5Bimage%5D=https%3A%2F%2Flocalhost.images.com%2Fckxit%2F750x422%2F469292_6c3e_5.jpg&params%5BtrackVideoPlay%5D=true"

结果:

https://localhost/embed/video/E0cZc345xCVTXwT/?params%5Bvars%5D%5Bplaylist%5D%5B0%5D%5Bimage%5D=https%3A%2F%2Flocalhost.images.com%2Fckxit%2F750x422%2F469292_6c3e_5.jpg&params%5BtrackVideoPlay%5D=true

但是,下面的代码将返回jwplayer和结果之外的所有内容。

data = re.search(r'jwplayer.*src=\"(.*?)\"', html, re.MULTILINE | re.DOTALL).group(1)

我怎样才能在src =&#34;之间得到所有东西?和&#34;只要它在jwplayer之后?

修改

好吧,我明白了。 html解析器更适合处理这类问题(html)。但是,让我说我只是好奇如何在正则表达式中执行此类操作,有人可以帮助我吗?知道我将来可能在文本文件中遇到此类问题的信息很有帮助。而且,即使我使用html解析器,我也需要传递一些正则表达式而不管是什么。

1 个答案:

答案 0 :(得分:0)

只需添加“?”在“。*”之后使它不那么贪心

r'jwplayer.*?src=\"(.*?)\"'