epub3:如何在存档中首先添加mimetype

时间:2015-01-06 13:28:29

标签: python zip epub epub3

我正在编写一个脚本来从html文件创建epub,但是当我检查我的epub时,我遇到以下错误:Mimetype entry missing or not the first in archive

Mimetype存在,但它不是epub中的第一个文件。在任何情况下使用Python都知道如何把它放在第一位?

2 个答案:

答案 0 :(得分:0)

抱歉,我现在没有时间给出详细解释,但这是我刚才写的一个(相对)简单的epub处理程序,它说明了如何做到这一点。

<强> epubpad.py

#! /usr/bin/env python

''' Pad the the ends of paragraph lines in an epub file with a single space char

    Written by PM 2Ring 2013.05.12
'''

import sys, re, zipfile

def bold(s): return "\x1b[1m%s\x1b[0m" % s

def report(attr, val):
    print "%s '%s'" % (bold(attr + ':'), val)

def fixepub(oldname, newname):
    oldz = zipfile.ZipFile(oldname, 'r')
    nlist = oldz.namelist()
    #print '\n'.join(nlist) + '\n'

    if nlist[0] != 'mimetype':
        print bold('Warning!!!'), "First file is '%s', not 'mimetype" % nlist[0]

    #get the name of the contents file from the container
    container = 'META-INF/container.xml'
    # container should be in nlist
    s = oldz.read(container)
    p = re.compile(r'full-path="(.*?)"')
    a = p.search(s)
    contents = a.group(1)
    #report("Contents file", contents)

    i = contents.find('/')
    if i>=0:
        dirname = contents[:i+1]
    else:
        #No directory separator in contents name!
        dirname = ''

    report("dirname", dirname)

    s = oldz.read(contents)
    #print s

    p = re.compile(r'<dc:creator.*>(.*)</dc:creator>')
    a = p.search(s)
    creator = a.group(1)
    report("Creator", creator)

    p = re.compile(r'<dc:title>(.*)</dc:title>')
    a = p.search(s)
    title = a.group(1)
    report("Title", title)

    #Find the names of all xhtml & html text files
    p = re.compile(r'\.[x]?htm[l]?')
    htmnames = [i for i in nlist if p.search(i) and i.find('wrap')==-1]

    #Pattern for end of lines that don't need padding
    eolp = re.compile(r'[>}]$')

    newz = zipfile.ZipFile(newname, 'w', zipfile.ZIP_DEFLATED)
    for fname in nlist:
        print fname,

        s = oldz.read(fname)

        if fname == 'mimetype':
            f = open(fname, 'w')
            f.write(s)
            f.close()
            newz.write(fname, fname, zipfile.ZIP_STORED)
            print ' * stored'
            continue

        if fname in htmnames:
            print ' * text',
            #Pad lines that are (hopefully) inside paragraphs...
            newlines = []
            for line in s.splitlines():
                if len(line)==0 or eolp.search(line):
                    newlines.append(line)
                else:
                    newlines.append(line + ' ')

            s = '\n'.join(newlines)

        newz.writestr(fname, s)
        print

    newz.close()
    oldz.close()

def main():
    oldname = len(sys.argv) > 1 and sys.argv[1]
    if not oldname:
        print 'No filename given!'
        raise SystemExit

    newname = len(sys.argv) > 2 and sys.argv[2]
    if not newname:
        if oldname.rfind('.') == -1:
            newname = oldname + '_P'
        else:
            newname = oldname.replace('.epub', '_P.epub')
        newname = newname.replace(' ', '_')

    print "Processing '%s' to '%s' ..." % (oldname, newname)

    fixepub(oldname, newname)

if __name__ == '__main__':
    main()

FWIW,我写了这个程序来处理我的简单电子阅读器的文件,如果它们不以空格结尾,就会烦恼地将段落连接在一起。

答案 1 :(得分:0)

我找到的解决方案:

  • 删除以前的mimetype文件

  • 创建新档案时,在添加任何其他内容之前创建新的mimetype文件:zipFile.writestr("mimetype", "application/epub+zip")

为什么它可以工作:所有epub的mimetype都是相同的:&#34; application / epub + zip&#34;,不需要使用原始文件。