如何从寻求阿尔法下载所有成绩单

时间:2016-09-17 16:50:43

标签: python web download web-crawler

有没有办法从SA网站自动下载所有成绩单?

http://seekingalpha.com/earnings/earnings-call-transcripts

我尝试使用http://newspaper.readthedocs.io/en/latest/ python代码但出现以下错误:

earnings_call_transcripts_2 = newspaper.build('http://seekingalpha.com/earnings/earnings-call-transcripts', memoize_articles=False)

Traceback (most recent call last):
  File "/Users/name/anaconda/lib/python3.5/site-packages/newspaper/parsers.py", line 67, in fromstring
    cls.doc = lxml.html.fromstring(html)
  File "/Users/name/anaconda/lib/python3.5/site-packages/lxml/html/__init__.py", line 867, in fromstring
    doc = document_fromstring(html, parser=parser, base_url=base_url, **kw)
  File "/Users/name/anaconda/lib/python3.5/site-packages/lxml/html/__init__.py", line 752, in document_fromstring
    value = etree.fromstring(html, parser, **kw)
  File "src/lxml/lxml.etree.pyx", line 3213, in lxml.etree.fromstring (src/lxml/lxml.etree.c:77697)
  File "src/lxml/parser.pxi", line 1819, in lxml.etree._parseMemoryDocument (src/lxml/lxml.etree.c:116494)
  File "src/lxml/parser.pxi", line 1700, in lxml.etree._parseDoc (src/lxml/lxml.etree.c:115040)
  File "src/lxml/parser.pxi", line 1040, in lxml.etree._BaseParser._parseUnicodeDoc (src/lxml/lxml.etree.c:109165)
  File "src/lxml/parser.pxi", line 573, in lxml.etree._ParserContext._handleParseResultDoc (src/lxml/lxml.etree.c:103404)
  File "src/lxml/parser.pxi", line 683, in lxml.etree._handleParseResult (src/lxml/lxml.etree.c:105058)
  File "src/lxml/parser.pxi", line 622, in lxml.etree._raiseParseError (src/lxml/lxml.etree.c:104143)
  File "<string>", line None
lxml.etree.XMLSyntaxError: line 295: b"htmlParseEntityRef: expecting ';'"
[Source parse ERR] http://seekingalpha.com/earnings/earnings-call-transcripts

0 个答案:

没有答案