Question

如何避免在Python中生成重复的图像文件？

我有一个项目，使用Pydocx（用于docx转换的Python模块）将MS Word文档转换为基本HTML。除了将图像文件写入磁盘的部分外，代码大部分都按预期工作。

我正在使用随机密钥和image_name函数以及urlretrieve的组合。我的要求是编写/生成唯一的自定义文件名。

这是我的代码：

def random_key(length):
    key = ''
    for i in range(length):
        key += random.choice(string.digits)
    return key

# Function to generate random image names

def image_name():
    return '{}'.format(os.path.join(IMAGE_LOCATION, random_key(4)))

def get_image_tag(self, image, width=None, height=None, rotate=None, 
    alt=None, caption=None):

        image_src = self.get_image_source(image)

# get base64 file extension from bytes
# https://matthewdaly.co.uk/blog/2015/07/04/handling-images-as-base64-
     strings-with-django-rest-framework/

format, imag = image_src.split(';base64,') 
# guess file extension
    ext = format.split('/')[-1]

# Capture the generated filename with the proper extension to use in img 
    source attribute

image_src_new = 'doc_img_' + image_name() + '.' + ext

# Code that is generating duplicate images from the same base64 source 
string
# Function to convert base64 string to image using urlretireve


urlretrieve(image_src, './source/output/' + image_src_new)

# Set the image source to the newly created filename

attrs = {
        'src': image_src_new
    }
if rotate:
    attrs['style'] = 'transform: rotate(%sdeg);' % rotate
if alt:
    attrs['alt'] = alt

return HtmlTag('img', allow_self_closing=True, allow_whitespace=True, 
**attrs)

# List files with glob using filter

source_files = glob.glob('./source/mydocument.docx')

for file in source_files:

html = PyDocXHTMLExporterImageOut(file).export()

# Get the full filename
base_filename = os.path.splitext(file)[0]
# Split the full filename to get the actual filename excluding parent 
directory

file_name = file.split('/')[3]
# Get the filename without the extension
no_ext_file_name = file_name.split('.')[0]

# Use codecs to write clean html content to utf

with codecs.open(('./source/output/' + no_ext_file_name).lower() + '.html', 
'w', 'utf-16') as output:
    output.write(html)

print('Done converting source word files to html')

由于

Answer 1

猜测你的意思是偶尔的随机名称重复 - 如果抛出异常，我会检查试图在try-except机箱内读取它的文件，而不是生成的名称是唯一的。

如何避免重复生成图像文件？

1 个答案: