我正在尝试抓取网站。看起来应该很简单,但是xpath和CSS并没有使我了解所需的数据。
<div class="col-md-12">
<script type="text/javascript" src="https://calendar.fsu.edu/widget/view?schools=fsu&types=89362&days=120&num=30&tags=UP&template=fsu-card-widget"></script><!-- begin code -->
<style type="text/css"> ... </style>
<div class="localist-widget-hl">
<ol class="event-list">
<li class="event">
<article class="event-card">
<div class="event-overview">
<header>
<h2 class="event-title">Adulting 101</h2>
<time class="event-short-date" datetime="2018-07-06"> 06 <abbr title="July"> Jul </abbr> </time> <img alt="Adulting 101" class="event-img" height="225" src="https://d3e1o4bcbhmj8g.cloudfront.net/photos/678822/square_300/24625c20b3ddbd8771591e95150b15e06e6a24a2.jpg" width="225">
</header>
<div class="content">
<p>New to college? Club Downunder is here to help! Come out to Adulting 101 in the SLC 101s to learn about topics from healthy eating to stress management. There will be a DIY...</p>
</div>
</div>
<div class="event-details">
<strong class="event-detail-title">Adulting 101</strong>
<dl class="event-specs">
<dt class="event-date">
Date
<div class="clock"></div>
</dt>
<dd class="event-date"> <time datetime="2018-07-06"> Friday, July 6 </time> </dd>
<dt class="event-location">
Location
<div class="pin"></div>
</dt>
<dd class="event-location"> Askew Student Life Building (SLB) </dd>
</dl>
</div>
<a class="cover 0" href="https://calendar.fsu.edu/event/adulting101_cdu?utm_campaign=widget&utm_medium=widget&utm_source=Florida+State+University+Calendar" target="_blank" rel="nofollow">Adulting 101</a> <span class="start-time location"> 07:00 pm - Askew Student Life Building (SLB) </span>
</article>
</li>
... LOTS MORE LIST ITEMS
</ol>
</div>
对我来说,这似乎很简单。但是,无论我尝试多少组合,我都找不到在<div class="localist-widget-hl">
中获得任何东西的方法。
response.css("div.col-md-12 script").extract_first()
按预期返回脚本,response.css("style::text").extract_first()
返回预期的样式(上面不包括)。
但是response.css("div.localist-widget-hl").extract_first()
完全不返回任何内容。在此div内进行任何选择也是如此。
我真正想做的是在所有这些底部的<a>
标记中获得链接,但是当然response.css("article.event-card a::attr(href)").extract_first()
不会返回任何内容,因为它位于.localist-widget-hl
之内。
我假设这与浏览器正在创建的HTML有关,并且不在实际的源代码中,但是我怎么知道呢?我只是不知道如何选择这些链接。