Python:从HTML获取链接和评论

时间:2019-01-11 03:33:51

标签: python html

我有很多类似于以下内容的HTML数据:

def time_frame(name):
    def wrapper(f):
        def wrapped(*args, **kwargs):
            start = time_millis()
            f(*args, **kwargs)
            end = time_millis()
            t = end - start
            if STACK_IS_SET:
                PROFILE_STACK.append("SOMETHING")
            # Somehow remember this value for the outer time_stack to use if needed
        return wrapped
    return wrapper

文件中有成千上万个。每个都有唯一的URL和注释。

我需要做的是同时获取URL和注释。数据来自args.filename。然后,GetData()读取一个json文件并返回数据,该数据是HTML(如上所示)。

                    <p class="comment-author" itemprop="author" itemscope itemtype="https://schema.org/Person">
                            <img alt=\'\' src=\'https://secure.gravatar.com/avatar/7c38dca1e1d8349d28124c65afca6285?s=48&#038;d=mm&#038;r=g\' srcset=\'https://secure.gravatar.com/avatar/7c38dca1e1d8349d28124c65afca6285?s=96&amp;d=mm&amp;r=g 2x\' class=\'avatar avatar-48 photo\' height=\'48\' width=\'48\' /><span itemprop="name"><a href="https://www.facebook.com/yobonks" class="comment-author-link" rel="external nofollow" itemprop="url">Bianca Roman</a></span> <span class="says">says</span>                      </p>

                    <p class="comment-meta"><time class="comment-time" datetime="2018-01-31T10:25:04+00:00" itemprop="datePublished"><a href="https://____.com/2015/01/love-giveaway-south-hill-designs-love-necklaces/#comment-18735" class="comment-time-link" itemprop="url">January 31, 2018 at 10:25 am</a></time></p>            </header>

            <div class="comment-content" itemprop="text">

                    <p>COMMENT 1</p>
            </div>

如何获取URL,以便可以将评论与URL相关联?

0 个答案:

没有答案