爬脚本json类型

时间:2019-02-08 21:11:27

标签: python json

使用python,我有此代码

  import requests
  from bs4 import BeautifulSoup
  import json

    links = [
                'https://www.ncaa.com/scoreboard/volleyball-women/d1/2018/09/17/all-conf',
                'https://www.ncaa.com/scoreboard/volleyball-women/d1/2018/12/15/all-conf'       
            ]

    data = []
    for link in links:
        req_data = requests.get(link)

        soup = BeautifulSoup(req_data.text, 'html.parser')

        for a in soup.find_all('a'):
          values = [span.text for span in a.find_all('span', {'class':'gamePod-game-team-name'})]
          if len(values) > 0:
            data.append(values)

    print(*data, sep = "\n")
    with open('test.json', 'w') as f:
        json.dump(data, f)

哪个给我这个结果:

  ['James Madison', 'VCU']
  ['Nebraska', 'Stanford']

我还希望从相同的网页上抓取currentDate。 在页面上的哪个位置住:

<script type="application/json" data-drupal-selector="drupal-settings-json">

我被困在正确地刮擦currentDate上。 这是我目前拥有的:

import requests
from bs4 import BeautifulSoup
import re
import json

res = requests.get('https://www.ncaa.com/scoreboard/volleyball-women/d1/2018/09/17')
soup = BeautifulSoup(res.text, 'html.parser')


script = soup.find_all("script", type="application/json", path="currentDate")

理想情况下,我想将该currentDate结果放在我的数据结果旁边。任何建议表示赞赏。

0 个答案:

没有答案