Question

我正在尝试将.html文件读入pd.read_html（）。但是，每个.html文件都位于不同的目录中。因此，我遍历了每个目录，并将path/name + html_file_name放在名为html_paths的列表中。我想遍历此列表，并使用pd.read_html（）

读取html_paths中的每个.html文件。

我试图像这样遍历html_paths：

for I in range(len(html_paths)):
     html_files = pd.read_html(html_paths[i])

我还尝试使用此设置来设置原始的html_paths：

for I in path.glob('**/*.html'):
     html_files = pd.read_html(i)

以任何方式尝试遍历路径lib列表，都会收到类似于TypeError: Cannot read object type 'WindowsPAth'的错误

到目前为止，我已经写过：

# initialize path
p = Path('C:\path\to\mother\directory')

# iterate over all directories within mother directory
# glob all html files
html_paths = [file for file in p.glob('**/*.html')

现在我要遍历html_paths中的每个文件并将它们读入pd.read_html()

Answer 1

您的html_paths列表包含Path对象，而不是预期的read_html这样的字符串。尝试将其转换为字符串：

for I in range(len(html_paths)):
    html_files = pd.read_html(str(html_paths[I]))

如何处理无法从pathlib读取“ WindowsPath”类型的熊猫？

1 个答案: