正则表达式不起作用

时间:2015-06-10 22:03:51

标签: python regex

嗨,我有正则表达式的麻烦。

这是一些来源:

    <div class="resultHeader googleHeader">
                            Wyniki z Google
                    </div>

                <div class="boxResult2  ">
                                                                <div class="box ">
                <div class="result">
                    <div class="link"> <a href="http://www.google.com/glass/start/"><b>Google Glass</b></a> </div>
                    <div class="source">
                        http://www.google.com/glass/start/

                            - <a rel="nofollow" href="query.html?hl=pl&amp;qt=related:http%3A%2F%2Fwww.google.com%2Fglass%2Fstart%2F">Podobne strony</a>
                                            </div><!-- source END -->
                                            <div class="desc">Thanks for exploring with us. The journey doesn&#39;t end here. You&#39;ll start to see <br />
future versions of <b>Glass</b> when they&#39;re ready (for now, no peeking).</div>
                                    </div><!-- result End -->
            </div><!-- box End -->
                                                                <div class="box ">
                <div class="result">
                    <div class="link"> <a href="http://pl.wikipedia.org/wiki/Google_Glass"><b>Google Glass</b> – Wikipedia, wolna encyklopedia</a> </div>
                    <div class="source">
                        http://pl.wikipedia.org/wiki/Google_Glass

                            - <a rel="nofollow" href="query.html?hl=pl&amp;qt=related:http%3A%2F%2Fpl.wikipedia.org%2Fwiki%2FGoogle_Glass">Podobne strony</a>
                                            </div><!-- source END -->
                                            <div class="desc"><b>Google Glass</b> to okulary o rozszerzonej rzeczywistości stworzone przez firmę <br />
Google. Okulary te mają docelowo mieć funkcje standardowego smartfona, ale&nbsp;...</div>
                                    </div><!-- result End -->
            </div><!-- box End -->

我想要<a href=" and ">之间的链接 - 就像这样:

http://www.google.com/glass/start/

我写了这个.. '<div class="link"> <a href="([^ ]+)"'但是没有用.. :(

1 个答案:

答案 0 :(得分:3)

由于您使用Python编写此代码,我可以建议基于Beautiful Soup的解决方案。

from bs4 import BeautifulSoup
html = 'YOUR STRING'
soup = BeautifulSoup(html)
divs = soup.find_all("div", {"class":"link"})

for tag in divs:
    a = tag.find_all("a")
    for t in a:
        if t.has_attr('href'):
            print t['href']

根据您的样本输入,输出:

http://www.google.com/glass/start/
http://pl.wikipedia.org/wiki/Google_Glass