Question

我犯了一个错误，并在我的网络服务器上重命名了一些图像。这打破了我的HTML中的一堆图像源（300个文件左右......）。不幸的是没有备份所以这是我需要通过编程修复的问题！：）

我以前的文件夹结构是这样的：

Root Folder
   >directory
     >subdirectory
        >img
          image1.gif
     >subdirectory2
        >img
          image1.gif
   >directory2
     >img
        image1.gif
    ...

我现在已将所有图像提取到一个文件夹中，并将所有父文件夹的名称添加到根文件夹中，直到图像名称，因此我们留下了：

directory_subdirectory_image1.gif
directory_subdirectory2_image1.gif
directory2_image1.gif

全部在一个文件夹中。

我想删除“img /”前缀，并将所有文件夹的名称添加到根文件夹中，直到我的图像src。

我曾尝试使用BeautifulSoup来执行此操作，获取所有图像，我无法使此操作在父文件夹前添加到根文件夹：

import os
from bs4 import BeautifulSoup

do = dir_with_original_files = 'C:\\Users\\ADMIN\\Desktop\\RootFolder'
dm = dir_with_modified_files = 'C:\\Users\\ADMIN\\Desktop\\RootFolderNewImgSrc'

for root, dirs, files in os.walk(do):
    for f in files:
        if f.endswith('~'): #you don't want to process backups
            continue
        original_file = os.path.join(root, f)
        modified_file = os.path.join(dm, mf)
        with open(original_file, 'r') as orig_f, \
            open(modified_file, 'w') as modi_f:
            soup = BeautifulSoup(orig_f.read())
            for t in soup.find_all('img'):
              #not sure what to do here - how do I edit the image source to prepend all parent directories?
            # This is where you create your new modified file.
            modi_f.write(soup.prettify().encode(soup.original_encoding))

我只是希望有人可以帮我编辑（a）跑！（b）仅在HTML文件上运行（c）更新我的HTML中的图像srcs，以便将当前HTML文件的父文件夹添加到根文件夹之前。

我认为我上面的内容应该非常接近，我只是缺少一些Python知识。

这需要付出很多努力，所以我会对此给予奖励以奖励最佳答案。谢谢:)）

Answer 1

以下是我如何去做。重点是更新soup对象，然后将其写出来。我在我做出更改的地方添加了评论。第一部分是相同的。

import os
from bs4 import BeautifulSoup

do = dir_with_original_files = 'C:\\Users\\ADMIN\\Desktop\\RootFolder'
dm = dir_with_modified_files = 'C:\\Users\\ADMIN\\Desktop\\RootFolderNewImgSrc'

首先，如果我理解正确，您只想使用HTML文件，因此我在第一个for循环中更改了条件以反映这一点。其次，我不知道Windows上Python路径的所有细节（假设您使用的是Windows机器），所以我在地方提供了代码变体。

我有另一种想法将旧的HTML文件写入修改后的目录，然后覆盖现有的HTML文件。这些用＆＃34; Alternate idea。＆＃34;

表示

for root, dirs, files in os.walk(do):
    for f in files:
        if not f.endswith('.html'): # only work with .html files
            continue
        original_file = os.path.join(root, f)
        modified_file = os.path.join(dm, f)
        with open(original_file, 'r') as orig_f:
            soup = BeautifulSoup(orig_f)

#       Alternative idea: write old files to dm
#       Make a backup copy in modified files dir
#       with open(modified_file, 'w') as modi_f:
#           modi_f.write(soup.prettify().encode(soup.original_encoding))

        for t in soup.find_all('img'):                    # Note: soup exists outside of with
            try:
                old_src = t['src']                        # Access src attribute
                image = os.path.split(old_src)[1]         # Get file name
#               Variant:
#               image = old_src.replace('img/','')
                relpath = os.path.relpath(root, do)       # Get relative path from do to root
#               Variant:
#               relpath = root[len(do):]
                folders = relpath.strip('\\').split('\\') # Remove outer slashes, split on folder separator
                new_src = '_'.join(folders.append(image)) # Join folders and image by underscore
                t['src'] = new_src                        # Modify src attribute
            except:                                       # Do nothing if tag does not have src attribute
                pass
        with open(modified_file, 'w') as modi_f:
            modi_f.write(soup.prettify().encode(soup.original_encoding)) 

#       Alternative idea: overwrite original html files
#       with open(original_file, 'w') as orig_f:
#           orig_f.write(soup.prettify().encode(soup.original_encoding))

以编程方式更新图像文本

1 个答案: