Question

我用Python 3编写了一个程序，该程序以一定深度抓取并下载Wikipedia类别的页面，并将其放置在目录中。

我面临的问题是，“假设在执行代码期间，如果算法遇到具有特殊字符（例如*，＃，$等）的Wikipedia的任何页面，则算法将失败，并显示以下消息根据错误跟踪“。

特殊字符Wiki页面的示例如下： https://en.wikipedia.org/wiki/Eden*

错误跟踪如下：

Traceback (most recent call last):
File "F:\Pen Drive 8 GB\PDF\Code\wiki.py", line 103, in <module>
d.search_and_store("Biomedical_engineering", subcategory_depth=2, path=PATH)
File "F:\Pen Drive 8 GB\PDF\Code\wiki.py", line 98, in search_and_store
self.search_and_store(subcat_result['title'], subcategory_depth-1, path)
File "F:\Pen Drive 8 GB\PDF\Code\wiki.py", line 98, in search_and_store
self.search_and_store(subcat_result['title'], subcategory_depth-1, path)
File "F:\Pen Drive 8 GB\PDF\Code\wiki.py", line 76, in search_and_store
if self.write_page_text(path, page_result):
File "F:\Pen Drive 8 GB\PDF\Code\wiki.py", line 44, in write_page_text
txt_file = open(file_path, 'w')
OSError: [Errno 22] Invalid argument: 'F:\\Code\\Wikipedia\\DATASETS\\Biomedical Engineering/Eden*.txt'

您可以清楚地看到，该算法会在不包含任何特殊字符的情况下抓取页面数据，但是会引发上述错误。

MWE非常大。如果有人建议，那么我可以分享。

请提出一些建议，因为我长期以来一直在尝试此操作，并感到沮丧。我什至不知道我在做什么错？请帮忙。

任何小小的帮助都深表感谢。

谢谢。

无法下载文件（Web爬网）-OSError [Errorno22]-无效的参数

0 个答案: