Question

上下文

我正在尝试从 Blackboard（一个经常发布家庭作业的网站）自动下载数据。

网站上有文件夹或项目。我在本地复制 Folders 的结构时遇到困难，主要是因为我无法递归地进行。例如，假设以下是虚构的文件夹名称及其结构：

- Lecture 1
-- Reading
--- Interesting stuffs
---- Very, very interesting stuffs
--- Less interesting stuffs

- Lecture 2
-- Reading

- Lecture 3
-- Homework

问题

鉴于我们不知道文件夹的深度，谁能指导我创建一个递归函数来复制文件夹结构？

其他信息

我使用 Selenium 遍历网页并使用 BeautifulSoup 来解析信息。我能够复制第一个站点并使用捕获的文件夹名称创建其相关文件夹。我只需要一种递归方式（1）如果是文件夹则输入文件夹，（2）创建文件夹，（3）如果是文件夹则进入新文件夹

参考：

driver.get(site)
elem = driver.find_element_by_xpath("//*")
html = elem.get_attribute("outerHTML")
soup = BeautifulSoup(html, 'html.parser')
folders = soup.find('div',{'class':'container clearfix'}).find("ul", {"id":"content_listContainer"}).findAll("li", {"class":"clearfix liItem read"})
contentFolderText, contentFolderLink, itemText, itemLink, misc = [[] for i in range(5)]
for folder in folders:
    if folder.find("img").get("alt") == "Content Folder":
        contentFolderText.append(folder.find("a").text.replace(u'\xa0', u''))
        contentFolderLink.append(folder.find("a").get('href'))
    elif folder.find("img").get("alt") == "Item":
        itemText.append(folder.find("a").text.replace(u'\xa0', u''))
        itemLink.append(folder.find("a").get('href'))

directory = 'Module Files'
sourceName = [os.path.join(directory, i) for i in contentFolderText]
sourceName = [re.sub(r'[^\w\s\\-]', '', i) for i in sourceName]
for folder in sourceName:
    if not os.path.exists(folder):
         os.makedirs(folder)

在此先感谢您的帮助！

基于元素递归创建目录（使用 Selenium / BS4 进行网页抓取）

上下文

问题

其他信息

0 个答案: