错误-OSError:读取文件“ http:// website”时出错,无法加载外部实体“ http://website/alpha.html”

时间:2019-09-07 19:46:58

标签: python html parsing lxml

我正在尝试对火山数据进行可视化处理。数据是使用lxml下载和解析的。 “火山世界”源页面在几个HTML表格中列出了火山数据,每个表格都被读取到单独的Pan​​das数据框中,并附加到数据框列表中。

我一直收到此错误:

OSError:读取文件'http://volcano.oregonstate.edu/oldroot/volcanoes/alpha.html'时出错:无法加载外部实体“ http://volcano.oregonstate.edu/oldroot/volcanoes/alpha.html

您可以协助执行此代码吗?

import json
from lxml import html
from mpl_toolkits.basemap import Basemap
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt


url ='http://volcano.oregonstate.edu/oldroot/volcanoes/alpha.html'
xpath = '//table'
tree = html.parse(url)
tables = tree.xpath(xpath)

table_dfs = []
for idx in range(4, len(tables)):
    df = pd.read_html(html.tostring(tables[idx]), header=0)[0]
    table_dfs.append(df)

这是我得到的错误:

Traceback (most recent call last):

  File "C:\Apps\Anaconda\envs\my_maps_env\lib\site-packages\IPython\core\interactiveshell.py", line 3326, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)

  File "<ipython-input-10-d2be1cde3918>", line 3, in <module>
    tree = etree.fromstring(url)

  File "src/lxml/etree.pyx", line 3234, in lxml.etree.fromstring

  File "src/lxml/parser.pxi", line 1876, in lxml.etree._parseMemoryDocument

  File "src/lxml/parser.pxi", line 1757, in lxml.etree._parseDoc

  File "src/lxml/parser.pxi", line 1068, in lxml.etree._BaseParser._parseUnicodeDoc

  File "src/lxml/parser.pxi", line 601, in lxml.etree._ParserContext._handleParseResultDoc

  File "src/lxml/parser.pxi", line 711, in lxml.etree._handleParseResult

  File "src/lxml/parser.pxi", line 640, in lxml.etree._raiseParseError

  File "<string>", line 1
XMLSyntaxError: Start tag expected, '<' not found, line 1, column 1

0 个答案:

没有答案