Python Web爬网(edX)

时间:2018-07-15 12:15:03

标签: python edx

我正在尝试在edx上下载课程。请参考以下自述文件(https://github.com/coursera-dl/edx-dl/blob/master/README.md)。我安装了Windows版Anaconda(我在Parallel Desktop中运行Windows 10)。

在最后一步中,我输入了以下命令:C:\ edx-dl-master \ edx-dl-master> edx-dl -u user@user.com课程URL,并输入我的edX密码后,得到以下信息:

为将来的请求构建初始标头。 获取初始CSRF令牌。 找到了CSRF令牌。 登录Open edX网站:https://courses.edx.org/login_ajax 从仪表板上提取课程信息。

Traceback (most recent call last):
  File "c:\programdata\anaconda3\lib\runpy.py", line 193, in 
_run_module_as_main
    "__main__", mod_spec)
  File "c:\programdata\anaconda3\lib\runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "C:\ProgramData\Anaconda3\Scripts\edx-dl.exe\__main__.py", line 9, in <module>
  File "c:\programdata\anaconda3\lib\site-packages\edx_dl\edx_dl.py", line 1011, in main
    for selected_course in selected_courses}
  File "c:\programdata\anaconda3\lib\site-packages\edx_dl\edx_dl.py", line 1011, in <dictcomp>
    for selected_course in selected_courses}
  File "c:\programdata\anaconda3\lib\site-packages\edx_dl\edx_dl.py", line 186, in get_available_sections
sections = page_extractor.extract_sections_from_html(page, BASE_URL)
  File "c:\programdata\anaconda3\lib\site-packages\edx_dl\parsing.py", line 403, in extract_sections_from_html
for i, section_soup in enumerate(sections_soup, 1)]
  File "c:\programdata\anaconda3\lib\site-packages\edx_dl\parsing.py", line 403, in <listcomp>
for i, section_soup in enumerate(sections_soup, 1)]
  File "c:\programdata\anaconda3\lib\site-packages\edx_dl\parsing.py", line 392, in _make_subsections
for i, s in enumerate(subsections_soup, 1)]
  File "c:\programdata\anaconda3\lib\site-packages\edx_dl\parsing.py", line 392, in <listcomp>
for i, s in enumerate(subsections_soup, 1)]
AttributeError: 'NoneType' object has no attribute 'string'

我是Python的新手,不确定我可以采取哪些补救措施。

1 个答案:

答案 0 :(得分:2)

尝试

git clone https://github.com/coursera-dl/edx-dl/blob/master/README.md

首先安装git。