如何在Python中解析这个XML响应?

时间:2015-05-11 08:49:57

标签: python xml parsing xpath lxml

这是我的XML文件:

<?xml version="1.0" ?>
<Items>
    <Item>
        <ASIN>3570102769</ASIN>
        <DetailPageURL>http://www.amazon.de/Inside-IS-Tage-Islamischen-Staat/dp/3570102769%3FSubscriptionId%3DAKIAI554OLCUMRCYB7ZA%26tag%3DjPp08vuSO4osfgfbCbEdF7TNqnWOm7YtprtqRPB9%26linkCode%3Dxm2%26camp%3D2025%26creative%3D165953%26creativeASIN%3D3570102769</DetailPageURL>
        <ItemLinks>
            <ItemLink>
                <Description>Add To Wishlist</Description>
                <URL>http://www.amazon.de/gp/registry/wishlist/add-item.html%3Fasin.0%3D3570102769%26SubscriptionId%3DAKIAI554OLCUMRCYB7ZA%26tag%3DjPp08vuSO4osfgfbCbEdF7TNqnWOm7YtprtqRPB9%26linkCode%3Dxm2%26camp%3D2025%26creative%3D12738%26creativeASIN%3D3570102769</URL>
            </ItemLink>
            <ItemLink>
                <Description>Tell A Friend</Description>
                <URL>http://www.amazon.de/gp/pdp/taf/3570102769%3FSubscriptionId%3DAKIAI554OLCUMRCYB7ZA%26tag%3DjPp08vuSO4osfgfbCbEdF7TNqnWOm7YtprtqRPB9%26linkCode%3Dxm2%26camp%3D2025%26creative%3D12738%26creativeASIN%3D3570102769</URL>
            </ItemLink>
            <ItemLink>
                <Description>All Customer Reviews</Description>
                <URL>http://www.amazon.de/review/product/3570102769%3FSubscriptionId%3DAKIAI554OLCUMRCYB7ZA%26tag%3DjPp08vuSO4osfgfbCbEdF7TNqnWOm7YtprtqRPB9%26linkCode%3Dxm2%26camp%3D2025%26creative%3D12738%26creativeASIN%3D3570102769</URL>
            </ItemLink>
            <ItemLink>
                <Description>All Offers</Description>
                <URL>http://www.amazon.de/gp/offer-listing/3570102769%3FSubscriptionId%3DAKIAI554OLCUMRCYB7ZA%26tag%3DjPp08vuSO4osfgfbCbEdF7TNqnWOm7YtprtqRPB9%26linkCode%3Dxm2%26camp%3D2025%26creative%3D12738%26creativeASIN%3D3570102769</URL>
            </ItemLink>
        </ItemLinks>
        <ItemAttributes>
            <Author>Jürgen Todenhöfer</Author>
            <Binding>Gebundene Ausgabe</Binding>
            <EAN>9783570102763</EAN>
            <EANList>
                <EANListElement>9783570102763</EANListElement>
            </EANList>
            <ISBN>3570102769</ISBN>
            <IsEligibleForTradeIn>1</IsEligibleForTradeIn>
            <ItemDimensions>
                <Height Units="hundredths-inches">874</Height>
                <Length Units="hundredths-inches">575</Length>
                <Width Units="hundredths-inches">126</Width>
            </ItemDimensions>
            <Label>C. Bertelsmann Verlag</Label>
            <Languages>
                <Language>
                    <Name>Deutsch</Name>
                    <Type>Published</Type>
                </Language>
                <Language>
                    <Name>Deutsch</Name>
                    <Type>Original</Type>
                </Language>
                <Language>
                    <Name>Deutsch</Name>
                    <Type>Unbekannt</Type>
                </Language>
            </Languages>
            <ListPrice>
                <Amount>1799</Amount>
                <CurrencyCode>EUR</CurrencyCode>
                <FormattedPrice>EUR 17,99</FormattedPrice>
            </ListPrice>
            <Manufacturer>C. Bertelsmann Verlag</Manufacturer>
            <ManufacturerMinimumAge Units="months">192</ManufacturerMinimumAge>
            <NumberOfPages>288</NumberOfPages>
            <PackageDimensions>
                <Height Units="hundredths-inches">118</Height>
                <Length Units="hundredths-inches">567</Length>
                <Weight Units="hundredths-pounds">93</Weight>
                <Width Units="hundredths-inches">252</Width>
            </PackageDimensions>
            <PackageQuantity>1</PackageQuantity>
            <ProductGroup>Book</ProductGroup>
            <ProductTypeName>ABIS_BOOK</ProductTypeName>
            <PublicationDate>2015-04-27</PublicationDate>
            <Publisher>C. Bertelsmann Verlag</Publisher>
            <Studio>C. Bertelsmann Verlag</Studio>
            <Title>Inside IS - 10 Tage im 'Islamischen Staat'</Title>
            <TradeInValue>
                <Amount>930</Amount>
                <CurrencyCode>EUR</CurrencyCode>
                <FormattedPrice>EUR 9,30</FormattedPrice>
            </TradeInValue>
        </ItemAttributes>
        <OfferSummary>
            <LowestNewPrice>
                <Amount>1799</Amount>
                <CurrencyCode>EUR</CurrencyCode>
                <FormattedPrice>EUR 17,99</FormattedPrice>
            </LowestNewPrice>
            <LowestUsedPrice>
                <Amount>1390</Amount>
                <CurrencyCode>EUR</CurrencyCode>
                <FormattedPrice>EUR 13,90</FormattedPrice>
            </LowestUsedPrice>
            <LowestCollectiblePrice>
                <Amount>4999</Amount>
                <CurrencyCode>EUR</CurrencyCode>
                <FormattedPrice>EUR 49,99</FormattedPrice>
            </LowestCollectiblePrice>
            <TotalNew>56</TotalNew>
            <TotalUsed>8</TotalUsed>
            <TotalCollectible>1</TotalCollectible>
            <TotalRefurbished>0</TotalRefurbished>
        </OfferSummary>
        <Offers>
            <TotalOffers>1</TotalOffers>
            <TotalOfferPages>1</TotalOfferPages>
            <MoreOffersUrl>http://www.amazon.de/gp/offer-listing/3570102769%3FSubscriptionId%3DAKIAI554OLCUMRCYB7ZA%26tag%3DjPp08vuSO4osfgfbCbEdF7TNqnWOm7YtprtqRPB9%26linkCode%3Dxm2%26camp%3D2025%26creative%3D12738%26creativeASIN%3D3570102769</MoreOffersUrl>
            <Offer>
                <OfferAttributes>
                    <Condition>New</Condition>
                </OfferAttributes>
                <OfferListing>
                    <OfferListingId>9KHCZj9qtL6ucVBPASfXaryQjU8tWbc0n%2F3F4F7GraOKW6Csji2OxpD93%2FkoHwgIGQctlnrtx4RWIeJULAcvvsFhiopFi08JdsZ%2FeO3u6g0%3D</OfferListingId>
                    <Price>
                        <Amount>1799</Amount>
                        <CurrencyCode>EUR</CurrencyCode>
                        <FormattedPrice>EUR 17,99</FormattedPrice>
                    </Price>
                    <Availability>Gewöhnlich versandfertig in 24 Stunden</Availability>
                    <AvailabilityAttributes>
                        <AvailabilityType>now</AvailabilityType>
                        <MinimumHours>0</MinimumHours>
                        <MaximumHours>0</MaximumHours>
                    </AvailabilityAttributes>
                    <IsEligibleForSuperSaverShipping>1</IsEligibleForSuperSaverShipping>
                </OfferListing>
            </Offer>
        </Offers>
    </Item>
    <Item>
        <ASIN>3813506479</ASIN>
        <DetailPageURL>http://www.amazon.de/Altes-Land-Roman-D%C3%B6rte-Hansen/dp/3813506479%3FSubscriptionId%3DAKIAI554OLCUMRCYB7ZA%26tag%3DjPp08vuSO4osfgfbCbEdF7TNqnWOm7YtprtqRPB9%26linkCode%3Dxm2%26camp%3D2025%26creative%3D165953%26creativeASIN%3D3813506479</DetailPageURL>
        <ItemLinks>
            <ItemLink>
                <Description>Add To Wishlist</Description>
                <URL>http://www.amazon.de/gp/registry/wishlist/add-item.html%3Fasin.0%3D3813506479%26SubscriptionId%3DAKIAI554OLCUMRCYB7ZA%26tag%3DjPp08vuSO4osfgfbCbEdF7TNqnWOm7YtprtqRPB9%26linkCode%3Dxm2%26camp%3D2025%26creative%3D12738%26creativeASIN%3D3813506479</URL>
            </ItemLink>
            <ItemLink>
                <Description>Tell A Friend</Description>
                <URL>http://www.amazon.de/gp/pdp/taf/3813506479%3FSubscriptionId%3DAKIAI554OLCUMRCYB7ZA%26tag%3DjPp08vuSO4osfgfbCbEdF7TNqnWOm7YtprtqRPB9%26linkCode%3Dxm2%26camp%3D2025%26creative%3D12738%26creativeASIN%3D3813506479</URL>
            </ItemLink>
            <ItemLink>
                <Description>All Customer Reviews</Description>
                <URL>http://www.amazon.de/review/product/3813506479%3FSubscriptionId%3DAKIAI554OLCUMRCYB7ZA%26tag%3DjPp08vuSO4osfgfbCbEdF7TNqnWOm7YtprtqRPB9%26linkCode%3Dxm2%26camp%3D2025%26creative%3D12738%26creativeASIN%3D3813506479</URL>
            </ItemLink>
            <ItemLink>
                <Description>All Offers</Description>
                <URL>http://www.amazon.de/gp/offer-listing/3813506479%3FSubscriptionId%3DAKIAI554OLCUMRCYB7ZA%26tag%3DjPp08vuSO4osfgfbCbEdF7TNqnWOm7YtprtqRPB9%26linkCode%3Dxm2%26camp%3D2025%26creative%3D12738%26creativeASIN%3D3813506479</URL>
            </ItemLink>
        </ItemLinks>
        <ItemAttributes>
            <Author>Dörte Hansen</Author>
            <Binding>Gebundene Ausgabe</Binding>
            <EAN>9783813506471</EAN>
            <EANList>
                <EANListElement>9783813506471</EANListElement>
            </EANList>
            <ISBN>3813506479</ISBN>
            <IsEligibleForTradeIn>1</IsEligibleForTradeIn>
            <ItemDimensions>
                <Height Units="hundredths-inches">870</Height>
                <Length Units="hundredths-inches">567</Length>
                <Width Units="hundredths-inches">114</Width>
            </ItemDimensions>
            <Label>Albrecht Knaus Verlag</Label>
            <Languages>
                <Language>
                    <Name>Deutsch</Name>
                    <Type>Published</Type>
                </Language>
                <Language>
                    <Name>Deutsch</Name>
                    <Type>Original</Type>
                </Language>
            </Languages>
            <ListPrice>
                <Amount>1999</Amount>
                <CurrencyCode>EUR</CurrencyCode>
                <FormattedPrice>EUR 19,99</FormattedPrice>
            </ListPrice>
            <Manufacturer>Albrecht Knaus Verlag</Manufacturer>
            <NumberOfPages>288</NumberOfPages>
            <PackageDimensions>
                <Height Units="hundredths-inches">118</Height>
                <Length Units="hundredths-inches">858</Length>
                <Weight Units="hundredths-pounds">101</Weight>
                <Width Units="hundredths-inches">559</Width>
            </PackageDimensions>
            <ProductGroup>Book</ProductGroup>
            <ProductTypeName>ABIS_BOOK</ProductTypeName>
            <PublicationDate>2015-02-16</PublicationDate>
            <Publisher>Albrecht Knaus Verlag</Publisher>
            <Studio>Albrecht Knaus Verlag</Studio>
            <Title>Altes Land: Roman</Title>
            <TradeInValue>
                <Amount>965</Amount>
                <CurrencyCode>EUR</CurrencyCode>
                <FormattedPrice>EUR 9,65</FormattedPrice>
            </TradeInValue>
        </ItemAttributes>
        <OfferSummary>
            <LowestNewPrice>
                <Amount>1999</Amount>
                <CurrencyCode>EUR</CurrencyCode>
                <FormattedPrice>EUR 19,99</FormattedPrice>
            </LowestNewPrice>
            <LowestUsedPrice>
                <Amount>1599</Amount>
                <CurrencyCode>EUR</CurrencyCode>
                <FormattedPrice>EUR 15,99</FormattedPrice>
            </LowestUsedPrice>
            <TotalNew>72</TotalNew>
            <TotalUsed>8</TotalUsed>
            <TotalCollectible>0</TotalCollectible>
            <TotalRefurbished>0</TotalRefurbished>
        </OfferSummary>
        <Offers>
            <TotalOffers>1</TotalOffers>
            <TotalOfferPages>1</TotalOfferPages>
            <MoreOffersUrl>http://www.amazon.de/gp/offer-listing/3813506479%3FSubscriptionId%3DAKIAI554OLCUMRCYB7ZA%26tag%3DjPp08vuSO4osfgfbCbEdF7TNqnWOm7YtprtqRPB9%26linkCode%3Dxm2%26camp%3D2025%26creative%3D12738%26creativeASIN%3D3813506479</MoreOffersUrl>
            <Offer>
                <OfferAttributes>
                    <Condition>New</Condition>
                </OfferAttributes>
                <OfferListing>
                    <OfferListingId>aeRv5KPt26T8S0hLrgV8Bv9UPYABYOMijGRxffbNJXUZSN4XfeeOZZpCZ28EURzmgMLlcYEBSRlMXS%2F8Z0pN1JbYerndME%2B2VK3RosfdQJA%3D</OfferListingId>
                    <Price>
                        <Amount>1999</Amount>
                        <CurrencyCode>EUR</CurrencyCode>
                        <FormattedPrice>EUR 19,99</FormattedPrice>
                    </Price>
                    <Availability>Gewöhnlich versandfertig in 24 Stunden</Availability>
                    <AvailabilityAttributes>
                        <AvailabilityType>now</AvailabilityType>
                        <MinimumHours>0</MinimumHours>
                        <MaximumHours>0</MaximumHours>
                    </AvailabilityAttributes>
                    <IsEligibleForSuperSaverShipping>1</IsEligibleForSuperSaverShipping>
                </OfferListing>
            </Offer>
        </Offers>
    </Item>
</Items>

我想获得任何ASIN元素。所以我尝试了这个:

from lxml import etree
doc = etree.fromstring(xmlstring)
items = doc.xpath('//Items/Item')
for a in items:
    asin = a.xpath('//ASIN/text()')
    print asin

我得到的是:

['3570102769', '3813506479']
['3570102769', '3813506479']

但我想要这个:

['3570102769']
['3813506479']

我不明白这里的问题是什么?我想我应该迭代任何元素,并且每个元素都是一个项目,其中一个 asin。为什么它返回两次两次 asin?

1 个答案:

答案 0 :(得分:2)

当您搜索a.xpath('//ASIN/text()')时,您再次搜索完整的文档树。引自XML Path language specification

  

//para选择文档根的所有para后代,从而选择与上下文节点相同的文档中的所有para元素

所以你正在做的是迭代匹配的Item节点并说&#34;请给我这个文件中的所有ASIN节点&#34;。此上下文(Item节点)将被忽略。

您应该做的是直接直接选择ASIN子节点。保持原始实现可能如下所示:

doc = etree.fromstring(xmlstring)
items = doc.xpath('//Items/Item')
for a in items:
    asin = a.xpath('ASIN/text()')
    print asin

提供您想要的输出:

['3570102769']
['3813506479']

或者,如果您不确定Item节点中出现的ASIN节点,您可以使用.//ASIN/text()