如何通过将内容替换为从pdf文件名中提取的内容,从模板xml文件创建新的xmls?

时间:2018-06-14 21:01:43

标签: python beautifulsoup

我创建了一个python脚本,它读取pdf文件名,然后使用'#'从文件名中提取字段。作为分隔符。提取这些字段后,脚本会读取模板xml文件并从模板中替换标记以创建新的xml文件。一切都正常。我觉得代码不是pythonic,需要让它变干。请指教。

template xml file:
<FaxInfo xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
    <Sender>
        <UserName>Administrator</UserName>
        <FaxNumber>23456789</FaxNumber>
        <TelephoneNumber>12345678</TelephoneNumber>
        <Company>RINCON</Company>
    </Sender>
    <RecipientList>
        <Recipient>
            <FaxNumber>6194</FaxNumber>
        </Recipient>
    </RecipientList>
    <DocumentList>
        <Document>pdfsample1.pdf</Document> 
    </DocumentList>
    <Options>
        <SendOptions>
            <Subject>SUBJECT</Subject> 
        </SendOptions>
        <OtherOptions>
            <Retry>3</Retry>
            <Interval>3</Interval>
            <BillingCode>EMAIL</BillingCode>
            <CustomCode>
                <CustomCode1></CustomCode1>
                <CustomCode2>STAMPTIME</CustomCode2> 
            </CustomCode>
        </OtherOptions>
    </Options>
</FaxInfo>

main script:
import os
import datetime
import glob
from xml.dom.minidom import parse, parseString
import bs4


path = r'C:\Users\sachin\Desktop\xmlcreater'


for file in glob.glob(os.path.join(path, '*.pdf')):
    email = file.split("#")[0]
    stamptime = file.split("#")[1]
    subject = file.split("#")[2].split('.')[0]
    f_date = datetime.datetime.strptime(stamptime, "%m%d%Y%H%M%S").strftime("%m/%d/%Y %H:%M:%S")
    f_email = email.split('\\')[-1]

    with open(r'C:\Users\sachin\Desktop\xmlcreater\sample\sample.xml', 'r') as infile:
        contents = infile.read()
        soup = bs4.BeautifulSoup(contents, 'html.parser')
        infile.close()

    with open('{}.xml'.format(file.split('#')[1]), 'w') as x:
        x.write(contents)
        x.close()

    for xml in glob.glob(os.path.join(path, '*.xml')):
        with open(xml, 'r') as f:
            data  = f.read()
            data1 = data.replace(soup.billingcode.string, f_email)

        with open(xml, 'w+') as k:
            k.write(data1)


    for xml in glob.glob(os.path.join(path, '*.xml')):
        with open(xml, 'r') as f:
            data  = f.read()
            data1 = data.replace(soup.customcode2.string, f_date)

        with open(xml, 'w+') as k:
            k.write(data1)


    for xml in glob.glob(os.path.join(path, '*.xml')):
        with open(xml, 'r') as f:
            data  = f.read()
            data1 = data.replace(soup.subject.string, subject)

        with open(xml, 'w+') as k:
            k.write(data1)


pdf file name sample:
example@example.co.in#06142018123721#testing.pdf

0 个答案:

没有答案