使用python scrapy废弃动态内容

时间:2017-11-15 03:45:25

标签: python scrapy

我想废弃此链接中的“日历”内容: https://gomore.dk/lejebil/27035

Calendar information i want

我想知道我是否可以使用python scrapy而不使用selenium来抓取这些内容。因为我无法从网络选项卡中找到任何信息。谢谢!

2 个答案:

答案 0 :(得分:1)

经过半天的研究,我注意到我可以使用scrapy-splash来检索JS处理的内容,这些内容可以提供网页的全部内容,包括日历信息。但是,日历信息与预期不符。例如工作日1的小时1应该是“危险”,但事实并非如此。

网页使用小时表示每天24小时,数据工作日0-6表示星期日,星期一,...,星期六。并且类别=“危险”表示日历被阻止(例如红色)

   <tr data-hour="0">
      <td class="hour">
        <div>
          <small>00.00</small>
        </div>
      </td>
      <td data-weekday="1"></td>
      <td data-weekday="2"></td>
      <td data-weekday="3" class="danger"></td>
      <td data-weekday="4" class="danger"></td>
      <td data-weekday="5"></td>
      <td data-weekday="6" class="cal-weekend danger"></td>
      <td data-weekday="0" class="cal-weekend danger"></td>
    </tr>

    <tr data-hour="1">
      <td class="hour">
        <div>
          <small>01.00</small>
        </div>
      </td>
      <td data-weekday="1"></td>
      <td data-weekday="2"></td>
      <td data-weekday="3" class="danger"></td>
      <td data-weekday="4" class="danger"></td>
      <td data-weekday="5"></td>
      <td data-weekday="6" class="cal-weekend danger"></td>
      <td data-weekday="0" class="cal-weekend danger"></td>
    </tr>

    <tr data-hour="2">
      <td class="hour">
        <div>
          <small>02.00</small>
        </div>
      </td>
      <td data-weekday="1"></td>
      <td data-weekday="2"></td>
      <td data-weekday="3" class="danger"></td>
      <td data-weekday="4" class="danger"></td>
      <td data-weekday="5"></td>
      <td data-weekday="6" class="cal-weekend danger"></td>
      <td data-weekday="0" class="cal-weekend danger"></td>
    </tr>

    <tr data-hour="3">
      <td class="hour">
        <div>
          <small>03.00</small>
        </div>
      </td>
      <td data-weekday="1"></td>
      <td data-weekday="2"></td>
      <td data-weekday="3" class="danger"></td>
      <td data-weekday="4" class="danger"></td>
      <td data-weekday="5"></td>
      <td data-weekday="6" class="cal-weekend danger"></td>
      <td data-weekday="0" class="cal-weekend danger"></td>
    </tr>

    <tr data-hour="4">
      <td class="hour">
        <div>
          <small>04.00</small>
        </div>
      </td>
      <td data-weekday="1"></td>
      <td data-weekday="2"></td>
      <td data-weekday="3" class="danger"></td>
      <td data-weekday="4" class="danger"></td>
      <td data-weekday="5"></td>
      <td data-weekday="6" class="cal-weekend danger"></td>
      <td data-weekday="0" class="cal-weekend danger"></td>
    </tr>

    <tr data-hour="5">
      <td class="hour">
        <div>
          <small>05.00</small>
        </div>
      </td>
      <td data-weekday="1"></td>
      <td data-weekday="2"></td>
      <td data-weekday="3" class="danger"></td>
      <td data-weekday="4" class="danger"></td>
      <td data-weekday="5"></td>
      <td data-weekday="6" class="cal-weekend danger"></td>
      <td data-weekday="0" class="cal-weekend danger"></td>
    </tr>

    <tr data-hour="6">
      <td class="hour">
        <div>
          <small>06.00</small>
        </div>
      </td>
      <td data-weekday="1"></td>
      <td data-weekday="2"></td>
      <td data-weekday="3" class="danger"></td>
      <td data-weekday="4" class="danger"></td>
      <td data-weekday="5"></td>
      <td data-weekday="6" class="cal-weekend danger"></td>
      <td data-weekday="0" class="cal-weekend danger"></td>
    </tr>

    <tr data-hour="7">
      <td class="hour">
        <div>
          <small>07.00</small>
        </div>
      </td>
      <td data-weekday="1"></td>
      <td data-weekday="2"></td>
      <td data-weekday="3" class="danger"></td>
      <td data-weekday="4" class="danger"></td>
      <td data-weekday="5"></td>
      <td data-weekday="6" class="cal-weekend danger"></td>
      <td data-weekday="0" class="cal-weekend danger"></td>
    </tr>

    <tr data-hour="8">
      <td class="hour">
        <div>
          <small>08.00</small>
        </div>
      </td>
      <td data-weekday="1"></td>
      <td data-weekday="2"></td>
      <td data-weekday="3" class="danger"></td>
      <td data-weekday="4" class="danger"></td>
      <td data-weekday="5"></td>
      <td data-weekday="6" class="cal-weekend danger"></td>
      <td data-weekday="0" class="cal-weekend danger"></td>
    </tr>

    <tr data-hour="9">
      <td class="hour">
        <div>
          <small>09.00</small>
        </div>
      </td>
      <td data-weekday="1"></td>
      <td data-weekday="2"></td>
      <td data-weekday="3" class="danger"></td>
      <td data-weekday="4" class="danger"></td>
      <td data-weekday="5"></td>
      <td data-weekday="6" class="cal-weekend danger"></td>
      <td data-weekday="0" class="cal-weekend danger"></td>
    </tr>

    <tr data-hour="10">
      <td class="hour">
        <div>
          <small>10.00</small>
        </div>
      </td>
      <td data-weekday="1"></td>
      <td data-weekday="2"></td>
      <td data-weekday="3" class="danger"></td>
      <td data-weekday="4" class="danger"></td>
      <td data-weekday="5"></td>
      <td data-weekday="6" class="cal-weekend danger"></td>
      <td data-weekday="0" class="cal-weekend danger"></td>
    </tr>

    <tr data-hour="11">
      <td class="hour">
        <div>
          <small>11.00</small>
        </div>
      </td>
      <td data-weekday="1"></td>
      <td data-weekday="2"></td>
      <td data-weekday="3" class="danger"></td>
      <td data-weekday="4" class="danger"></td>
      <td data-weekday="5"></td>
      <td data-weekday="6" class="cal-weekend danger"></td>
      <td data-weekday="0" class="cal-weekend danger"></td>
    </tr>

    <tr data-hour="12">
      <td class="hour">
        <div>
          <small>12.00</small>
        </div>
      </td>
      <td data-weekday="1"></td>
      <td data-weekday="2"></td>
      <td data-weekday="3" class="danger"></td>
      <td data-weekday="4" class="danger"></td>
      <td data-weekday="5"></td>
      <td data-weekday="6" class="cal-weekend danger"></td>
      <td data-weekday="0" class="cal-weekend danger"></td>
    </tr>

    <tr data-hour="13">
      <td class="hour">
        <div>
          <small>13.00</small>
        </div>
      </td>
      <td data-weekday="1"></td>
      <td data-weekday="2" class="danger"></td>
      <td data-weekday="3" class="danger"></td>
      <td data-weekday="4"></td>
      <td data-weekday="5"></td>
      <td data-weekday="6" class="cal-weekend danger"></td>
      <td data-weekday="0" class="cal-weekend danger"></td>
    </tr>

    <tr data-hour="14">
      <td class="hour">
        <div>
          <small>14.00</small>
        </div>
      </td>
      <td data-weekday="1"></td>
      <td data-weekday="2" class="danger"></td>
      <td data-weekday="3" class="danger"></td>
      <td data-weekday="4"></td>
      <td data-weekday="5"></td>
      <td data-weekday="6" class="cal-weekend danger"></td>
      <td data-weekday="0" class="cal-weekend danger"></td>
    </tr>

    <tr data-hour="15">
      <td class="hour">
        <div>
          <small>15.00</small>
        </div>
      </td>
      <td data-weekday="1"></td>
      <td data-weekday="2" class="danger"></td>
      <td data-weekday="3" class="danger"></td>
      <td data-weekday="4"></td>
      <td data-weekday="5" class="danger"></td>
      <td data-weekday="6" class="cal-weekend danger"></td>
      <td data-weekday="0" class="cal-weekend danger"></td>
    </tr>

    <tr data-hour="16">
      <td class="hour">
        <div>
          <small>16.00</small>
        </div>
      </td>
      <td data-weekday="1"></td>
      <td data-weekday="2" class="danger"></td>
      <td data-weekday="3" class="danger"></td>
      <td data-weekday="4"></td>
      <td data-weekday="5" class="danger"></td>
      <td data-weekday="6" class="cal-weekend danger"></td>
      <td data-weekday="0" class="cal-weekend danger"></td>
    </tr>

    <tr data-hour="17">
      <td class="hour">
        <div>
          <small>17.00</small>
        </div>
      </td>
      <td data-weekday="1"></td>
      <td data-weekday="2" class="danger"></td>
      <td data-weekday="3" class="danger"></td>
      <td data-weekday="4"></td>
      <td data-weekday="5" class="danger"></td>
      <td data-weekday="6" class="cal-weekend danger"></td>
      <td data-weekday="0" class="cal-weekend danger"></td>
    </tr>

    <tr data-hour="18">
      <td class="hour">
        <div>
          <small>18.00</small>
        </div>
      </td>
      <td data-weekday="1"></td>
      <td data-weekday="2" class="danger"></td>
      <td data-weekday="3" class="danger"></td>
      <td data-weekday="4"></td>
      <td data-weekday="5" class="danger"></td>
      <td data-weekday="6" class="cal-weekend danger"></td>
      <td data-weekday="0" class="cal-weekend"></td>
    </tr>

    <tr data-hour="19">
      <td class="hour">
        <div>
          <small>19.00</small>
        </div>
      </td>
      <td data-weekday="1"></td>
      <td data-weekday="2" class="danger"></td>
      <td data-weekday="3" class="danger"></td>
      <td data-weekday="4"></td>
      <td data-weekday="5" class="danger"></td>
      <td data-weekday="6" class="cal-weekend danger"></td>
      <td data-weekday="0" class="cal-weekend"></td>
    </tr>

    <tr data-hour="20">
      <td class="hour">
        <div>
          <small>20.00</small>
        </div>
      </td>
      <td data-weekday="1"></td>
      <td data-weekday="2" class="danger"></td>
      <td data-weekday="3" class="danger"></td>
      <td data-weekday="4"></td>
      <td data-weekday="5" class="danger"></td>
      <td data-weekday="6" class="cal-weekend danger"></td>
      <td data-weekday="0" class="cal-weekend"></td>
    </tr>

    <tr data-hour="21">
      <td class="hour">
        <div>
          <small>21.00</small>
        </div>
      </td>
      <td data-weekday="1"></td>
      <td data-weekday="2" class="danger"></td>
      <td data-weekday="3" class="danger"></td>
      <td data-weekday="4"></td>
      <td data-weekday="5" class="danger"></td>
      <td data-weekday="6" class="cal-weekend danger"></td>
      <td data-weekday="0" class="cal-weekend"></td>
    </tr>

    <tr data-hour="22">
      <td class="hour">
        <div>
          <small>22.00</small>
        </div>
      </td>
      <td data-weekday="1"></td>
      <td data-weekday="2" class="danger"></td>
      <td data-weekday="3" class="danger"></td>
      <td data-weekday="4"></td>
      <td data-weekday="5" class="danger"></td>
      <td data-weekday="6" class="cal-weekend danger"></td>
      <td data-weekday="0" class="cal-weekend"></td>
    </tr>

scrapy-splash中呈现的HTML有可能出错吗?除了这个日历表外,其他内容似乎都是正确的。

答案 1 :(得分:0)

https://dgaqgnnkkz5ef.cloudfront.net/assets/application-840c6707422c9d0ee7fb9005972e7c7201803d9c24bbcd23253e6ec7beedd6a1.js这是他们从中获取数据的JS文件,我没有时间去检查,但是你可以对他们如何进行更多的研究来搜索js-occupancy-calendar和{{ 1}}你会有一些想法。