使用R从xml文件中提取多个值

时间:2017-02-07 14:19:23

标签: r xml

以下是我正在使用的XML文件的一个小例子。我想提取卖家评分大于150的拍卖数量。有谁知道我是怎么做的?

<root>
        <listing>
            <seller_info>
                <seller_name>seller12</seller_name>
                <seller_rating>100</seller_rating>
            </seller_info>
            <payment_types>
                Visa
            </payment_types>
            <shipping_info>
                Buyer pays shipping charges.
            </shipping_info>
            <buyer_protection_info></buyer_protection_info>
            <auction_info>
                <current_bid>$820.00</current_bid>
                <time_left>4 days, 18 hours +</time_left>
                <high_bidder>
                    <bidder_name>gosha555@example.com</bidder_name>
                    <bidder_rating>-2</bidder_rating>
                </high_bidder>
                <num_items>1</num_items>
                <num_bids>12</num_bids>
                <started_at>$1.00</started_at>
                <bid_increment></bid_increment>
                <notes></notes>
            </auction_info>
        </listing>
        <listing>
            <seller_info>
                <seller_name>seller50</seller_name>
                <seller_rating>200</seller_rating>
            </seller_info>
            <payment_types>
                Visa
            </payment_types>
            <shipping_info>
                Buyer pays shipping charges.
            </shipping_info>
            <buyer_protection_info></buyer_protection_info>
            <auction_info>
                <current_bid>$920.00</current_bid>
                <time_left>4 days, 17 hours +</time_left>
                <high_bidder>
                    <bidder_name>seller50@example.com</bidder_name>
                    <bidder_rating>-2</bidder_rating>
                </high_bidder>
                <num_items>1</num_items>
                <num_bids>5</num_bids>
                <started_at>$1.00</started_at>
                <bid_increment></bid_increment>
                <notes></notes>
            </auction_info>
        </listing>  
<root>

到目前为止,我使用xmlTreeParse解析了这些数据并使用了xpathSapply

doc <- xmlTreeParse("ebay.xml", useInternalNodes = TRUE)
log <- xpathSApply(doc, '//*/seller_rating')

1 个答案:

答案 0 :(得分:2)

我看到你的代码也在获取标签。如果您使用:

SellerRatings = xmlSApply(doc["//listing//seller_info//seller_rating"], xmlValue)

您将获得值,因此您可以计算它们。

sum(SellerRatings > 150)

或简要

sum(xmlSApply(doc["//*//seller_rating"], xmlValue) > 150)