XML到Python,导入间隔

时间:2018-08-08 10:42:42

标签: python xml beautifulsoup

**如果您要给我一个负面观点,请至少说明原因。我只是想了解。

有人遇到吗?我还没弄清楚。我有一个大的XML文件正在尝试解析,当我使用汤(或其他任何东西)将其导入Python时,它的输出在每个字符之间都有一个空格。这是最奇怪的事情。

输出示例:

H e a d e r > 

     P I E S V e r s i o n > 6 . 5 / P I E S V e r s i o n > 

     S u b m i s s i o n T y p e > F U L L / S u b m i s s i o n T y p e > 

     P a r e n t D U N S N u m b e r > 0 5 5 4 3 3 1 9 7 / P a r e n t D U N S N u m b e r > 

     C u r r e n c y C o d e > U S D / C u r r e n c y C o d e > 

     L a n g u a g e C o d e > E N / L a n g u a g e C o d e > 

     T e c h n i c a l C o n t a c t > B r e n t   B u s h n e l l / T e c h n i c a l C o n t a c t > 

     C o n t a c t E m a i l > B r e n t . B u s h n e l l @ g a t e s . c o m / C o n t a c t E m a i l > 

 / H e a d e r > 



<Header>
<PIESVersion>6.5</PIESVersion>
<SubmissionType>FULL</SubmissionType>
<ParentDUNSNumber>055433197</ParentDUNSNumber>
<CurrencyCode>USD</CurrencyCode>
<LanguageCode>EN</LanguageCode>
<TechnicalContact>Brent Bushnell</TechnicalContact>
<ContactEmail>Brent.Bushnell@gates.com</ContactEmail>
</Header>

应该起作用的代码部分:

from bs4 import BeautifulSoup
import csv

infile = open('GATES.xml','r')
contents = infile.read()
soup = BeautifulSoup(contents,'xml')

bullets = {}
attributes = {}
attr_ids = {}
attr_test = {}

for part in soup.find_all('Item'):

    # --Pulls bullet point data from PIES File-------------------------------------------------------
    bullets[part.find('PartNumber').get_text()] = [x.get_text() for x in part.find_all('Description')]

    # --Pulls attribute data from PIES File----------------------------------------------------------
    for x in part.find_all('PartNumber'):

        partNum = x.get_text()

        for v in part.find_all('ProductAttribute'):
            attr_value = v.get_text()
            attr_id = v.get('AttributeID')

            attr_ids[attr_id] = attr_value

        attributes[partNum] = [attr_ids]

0 个答案:

没有答案