将 .py 转换为 .html 文件

时间:2021-06-09 06:08:56

标签: python jupyter-notebook nbconvert

我从 jupyter notebook 下载了一个 .py 文件,我的目标是为每日抓取设置任务调度程序。这个文件 (scrape.py) 的目标是从网站上抓取数据并保存为 html (output_scraped.html)。

代码如下:

from bs4 import BeautifulSoup
import requests

# assign destination
url = a url

# Grab content of that url
req = requests.get(url)
soup = BeautifulSoup(req.text, 'html.parser')

titles = []
levels = soup.find_all('article', {'class' : '1234'})
for level in levels:
    divs = level.find_all('a', {'class' : '5678'})
    for div in divs:
        titles.append(div.text)

hirer = []
for level in levels:
    hirer_divs = level.find_all('span', {'class' : '9873'})
    for hirer_div in hirer_divs:
        hirer.append(hirer_div.text)

mylist = []
ids_final = soup.find_all(attrs={"data-id": '5tw287'})
for ifn in ids_final:
    mylist.append(ifn["data-id"])

# # Putting it all together
for one, two, three in zip(titles, hirer, mylist):
    final = print(one, two, three)

# In[16]:
# converting to html file
from nbconvert import HTMLExporter
import codecs
import nbformat

notebook_name = 'scrape.py'
output_file_name = 'output_scraped.html'

exporter = HTMLExporter()
output_notebook = nbformat.read(notebook_name, as_version=4)

output, resources = exporter.from_notebook_node(output_notebook)
codecs.open(output_file_name, 'w', encoding='utf-8').write(output)

以上似乎在 jupyter notebook 上运行没有任何问题,但是,当在 .py 文件上运行时,它会产生输出,直到 #Putting it all together 部分,然后给我这个相当令人生畏的错误:

Traceback (most recent call last):
  File "C:\Python38\lib\site-packages\nbformat\reader.py", line 14, in parse_json
    nb_dict = json.loads(s, **kwargs)
  File "C:\Python38\lib\json\__init__.py", line 357, in loads
    return _default_decoder.decode(s)
  File "C:\Python38\lib\json\decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "C:\Python38\lib\json\decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\Python38\a-SEEK_scrape.py", line 166, in <module>
    output_notebook = nbformat.read(notebook_name, as_version=4)
  File "C:\Python38\lib\site-packages\nbformat\__init__.py", line 141, in read
    return reads(f.read(), as_version, **kwargs)
  File "C:\Python38\lib\site-packages\nbformat\__init__.py", line 73, in reads
    nb = reader.reads(s, **kwargs)
  File "C:\Python38\lib\site-packages\nbformat\reader.py", line 58, in reads
    nb_dict = parse_json(s, **kwargs)
  File "C:\Python38\lib\site-packages\nbformat\reader.py", line 17, in parse_json
    raise NotJSONError(("Notebook does not appear to be JSON: %r" % s)[:77] + "...") from e
nbformat.reader.NotJSONError: Notebook does not appear to be JSON: '#!/usr/bin/env python\n# coding: utf-8\...
>>> 

是不是因为它不是 JSON 文件?为什么会这样呢?如果我找到一种方法将其转换为 JSON,它会实现我最初想要做的事情吗?任何帮助/指针将不胜感激。谢谢!

0 个答案:

没有答案
相关问题