应用错误收集

pyspark [Errno 2]没有这样的文件或目录：＆＃39; hdfs：//..."

时间：2017-10-09 19:26:17

标签： python html beautifulsoup pyspark hdfs

我有一个关于使用带有BeautifulSoup的pyspark来解析html文件（从HDFS）到csv文件，然后将csv文件保存在HDFS中的问题。

这是代码和错误：

from bs4 import BeautifulSoup
html_path = hdfs://.../user/root/input/.../index.html"
soup = BeautifulSoup(open(html_path))**

然后我有以下错误：

IOError Traceback (most recent call last)
<ipython-input-3-875680018b76>in <module>()
----> 1 soup = BeautifulSoup(open(html_path))

IOError: [Errno 2] No such file or directory: 'hdfs://.../user/root/input/.../index.html'

如何解决？

0 个答案:

没有答案