我有很多类似于以下内容的HTML数据:
def time_frame(name):
def wrapper(f):
def wrapped(*args, **kwargs):
start = time_millis()
f(*args, **kwargs)
end = time_millis()
t = end - start
if STACK_IS_SET:
PROFILE_STACK.append("SOMETHING")
# Somehow remember this value for the outer time_stack to use if needed
return wrapped
return wrapper
文件中有成千上万个。每个都有唯一的URL和注释。
我需要做的是同时获取URL和注释。数据来自args.filename。然后,GetData()读取一个json文件并返回数据,该数据是HTML(如上所示)。
<p class="comment-author" itemprop="author" itemscope itemtype="https://schema.org/Person">
<img alt=\'\' src=\'https://secure.gravatar.com/avatar/7c38dca1e1d8349d28124c65afca6285?s=48&d=mm&r=g\' srcset=\'https://secure.gravatar.com/avatar/7c38dca1e1d8349d28124c65afca6285?s=96&d=mm&r=g 2x\' class=\'avatar avatar-48 photo\' height=\'48\' width=\'48\' /><span itemprop="name"><a href="https://www.facebook.com/yobonks" class="comment-author-link" rel="external nofollow" itemprop="url">Bianca Roman</a></span> <span class="says">says</span> </p>
<p class="comment-meta"><time class="comment-time" datetime="2018-01-31T10:25:04+00:00" itemprop="datePublished"><a href="https://____.com/2015/01/love-giveaway-south-hill-designs-love-necklaces/#comment-18735" class="comment-time-link" itemprop="url">January 31, 2018 at 10:25 am</a></time></p> </header>
<div class="comment-content" itemprop="text">
<p>COMMENT 1</p>
</div>
如何获取URL,以便可以将评论与URL相关联?