如何避免在Python中生成重复的图像文件?
我有一个项目,使用Pydocx(用于docx转换的Python模块)将MS Word文档转换为基本HTML。 除了将图像文件写入磁盘的部分外,代码大部分都按预期工作。
我正在使用随机密钥和image_name函数以及urlretrieve的组合。 我的要求是编写/生成唯一的自定义文件名。
这是我的代码:
def random_key(length):
key = ''
for i in range(length):
key += random.choice(string.digits)
return key
# Function to generate random image names
def image_name():
return '{}'.format(os.path.join(IMAGE_LOCATION, random_key(4)))
def get_image_tag(self, image, width=None, height=None, rotate=None,
alt=None, caption=None):
image_src = self.get_image_source(image)
# get base64 file extension from bytes
# https://matthewdaly.co.uk/blog/2015/07/04/handling-images-as-base64-
strings-with-django-rest-framework/
format, imag = image_src.split(';base64,')
# guess file extension
ext = format.split('/')[-1]
# Capture the generated filename with the proper extension to use in img
source attribute
image_src_new = 'doc_img_' + image_name() + '.' + ext
# Code that is generating duplicate images from the same base64 source
string
# Function to convert base64 string to image using urlretireve
urlretrieve(image_src, './source/output/' + image_src_new)
# Set the image source to the newly created filename
attrs = {
'src': image_src_new
}
if rotate:
attrs['style'] = 'transform: rotate(%sdeg);' % rotate
if alt:
attrs['alt'] = alt
return HtmlTag('img', allow_self_closing=True, allow_whitespace=True,
**attrs)
# List files with glob using filter
source_files = glob.glob('./source/mydocument.docx')
for file in source_files:
html = PyDocXHTMLExporterImageOut(file).export()
# Get the full filename
base_filename = os.path.splitext(file)[0]
# Split the full filename to get the actual filename excluding parent
directory
file_name = file.split('/')[3]
# Get the filename without the extension
no_ext_file_name = file_name.split('.')[0]
# Use codecs to write clean html content to utf
with codecs.open(('./source/output/' + no_ext_file_name).lower() + '.html',
'w', 'utf-16') as output:
output.write(html)
print('Done converting source word files to html')
由于
答案 0 :(得分:0)
猜测你的意思是偶尔的随机名称重复 - 如果抛出异常,我会检查试图在try-except机箱内读取它的文件,而不是生成的名称是唯一的。