使用R从XML数据生成销售报告?转换为Dataframe?

时间:2017-06-13 11:48:32

标签: r xml xpath

我有一些在线订单数据作为XML。我想用订单,销售,退货等总数做一份报告。

<ArrayOfItem>
<Item>
<total>333.3</total>
<terminalid>1</terminalid>
<subtotal>330</subtotal>
<storeid>1000</storeid>
<itemlist>
<TransactionLine><LineNumber>1</LineNumber><Name>Moto G Turbo Edition Black</Name><ItemUPC>5479892348535</ItemUPC><Quantity>1</Quantity><SalePrice>330</SalePrice><IndividualPrice>330</IndividualPrice><CreatedDate>2017-06-13T09:42:52.1411148Z</CreatedDate><Status>0</Status><ShippingCost>0</ShippingCost><TotalTax>3.3</TotalTax><AppliedTaxes><LineTax><TaxId>0</TaxId><Amount>0</Amount><CreatedDate>0001-01-01T00:00:00</CreatedDate></LineTax></AppliedTaxes><AppliedDiscounts /><ItemCondition>SellableAsNew</ItemCondition><ReturnReason>PoorQuality</ReturnReason></TransactionLine>
</itemlist>
<transactiontenders>1</transactiontenders>
<transactiontenders>2</transactiontenders>
<transactiontenders>4</transactiontenders>
<transactiontype>1</transactiontype>
<transdate>2017-06-13T09:52:54Z</transdate>
<transtime>09:52</transtime>
</Item>
<Item>
<total>343.59</total>
<terminalid>1</terminalid>
<subtotal>340.29</subtotal>
<storeid>1000</storeid>
<itemlist>
<TransactionLine><LineNumber>1</LineNumber><Name>Moto G Turbo Edition Black</Name><ItemUPC>5479892348535</ItemUPC><Quantity>1</Quantity><SalePrice>330</SalePrice><IndividualPrice>330</IndividualPrice><CreatedDate>2017-06-13T09:53:00.8548823Z</CreatedDate><Status>0</Status><ShippingCost>0</ShippingCost><TotalTax>3.3</TotalTax><AppliedTaxes><LineTax><TaxId>0</TaxId><Amount>0</Amount><CreatedDate>0001-01-01T00:00:00</CreatedDate></LineTax></AppliedTaxes><AppliedDiscounts /><ItemCondition>SellableAsNew</ItemCondition><ReturnReason>PoorQuality</ReturnReason></TransactionLine>
<TransactionLine><LineNumber>2</LineNumber><Name>This Was A Man</Name><ItemUPC>777221028297</ItemUPC><Quantity>1</Quantity><SalePrice>4.99</SalePrice><IndividualPrice>4.99</IndividualPrice><CreatedDate>2017-06-13T09:53:07.8263895Z</CreatedDate><Status>0</Status><ShippingCost>0</ShippingCost><TotalTax>0</TotalTax><AppliedTaxes /><AppliedDiscounts /><ItemCondition>SellableAsNew</ItemCondition><ReturnReason>PoorQuality</ReturnReason></TransactionLine>
<TransactionLine><LineNumber>3</LineNumber><Name>A Prisoner of Birth</Name><ItemUPC>4000111222302</ItemUPC><Quantity>1</Quantity><SalePrice>5.3</SalePrice><IndividualPrice>5.3</IndividualPrice><CreatedDate>2017-06-13T09:53:11.124866Z</CreatedDate><Status>0</Status><ShippingCost>0</ShippingCost><TotalTax>0</TotalTax><AppliedTaxes /><AppliedDiscounts /><ItemCondition>SellableAsNew</ItemCondition><ReturnReason>PoorQuality</ReturnReason></TransactionLine>
</itemlist>
<transactiontenders>1</transactiontenders><transactiontenders>2</transactiontenders>
<transactiontype>1</transactiontype>
<transdate>2017-06-13T09:53:29Z</transdate>
<transtime>09:53</transtime>
</Item>
</ArrayOfItem>

我做过这样的事情:

library(XML)
y <- xmlToDataFrame('C:\\App\\06122017.XML')
nrow(y) # To get total number of order
doc = xmlInternalTreeParse('C:\\App\\06122017.XML')
transactionlineItems <- xpathSApply(doc, '//TransactionLine') # list
transactionlineItems

我试过这个来得到总数的总和,但它没有用。

colSums(y[,c("total")]) # not working

transactionlineItems是XML元素的列表,我想从中导出数据框,应用一些逻辑(查看特定的订单项是销售还是退货),并为销售创建单独的总计并返回。此外,获取每个产品的数量,看看哪个产品销售更多。现在,通过将逻辑应用于JSON格式的相同数据,我正在做这个浏览器端。我想将它移到服务器端并选择R编程。

1 个答案:

答案 0 :(得分:0)

如果您确实在数据帧转换时设置了热量:

你走在正确的轨道上。此答案结合了您的xmlToDataFramexpathSApply提示。您应该小心确保数值不会被处理为字符,甚至是因素。

library(XML)

order.xml.string <- '<?xml version="1.0" encoding="UTF-8"?>
<ArrayOfItem>
<Item>
<total>333.3</total>
<terminalid>1</terminalid>
<subtotal>330</subtotal>
<storeid>1000</storeid>
<itemlist>
<TransactionLine>
<LineNumber>1</LineNumber>
<Name>Moto G Turbo Edition Black</Name>
<ItemUPC>5479892348535</ItemUPC>
<Quantity>1</Quantity>
<SalePrice>330</SalePrice>
<IndividualPrice>330</IndividualPrice>
<CreatedDate>2017-06-13T09:42:52.1411148Z</CreatedDate>
<Status>0</Status>
<ShippingCost>0</ShippingCost>
<TotalTax>3.3</TotalTax>
<AppliedTaxes>
<LineTax>
<TaxId>0</TaxId>
<Amount>0</Amount>
<CreatedDate>0001-01-01T00:00:00</CreatedDate>
</LineTax>
</AppliedTaxes>
<AppliedDiscounts/>
<ItemCondition>SellableAsNew</ItemCondition>
<ReturnReason>PoorQuality</ReturnReason>
</TransactionLine>
</itemlist>
<transactiontenders>1</transactiontenders>
<transactiontenders>2</transactiontenders>
<transactiontenders>4</transactiontenders>
<transactiontype>1</transactiontype>
<transdate>2017-06-13T09:52:54Z</transdate>
<transtime>09:52</transtime>
</Item>
<Item>
<total>343.59</total>
<terminalid>1</terminalid>
<subtotal>340.29</subtotal>
<storeid>1000</storeid>
<itemlist>
<TransactionLine>
<LineNumber>1</LineNumber>
<Name>Moto G Turbo Edition Black</Name>
<ItemUPC>5479892348535</ItemUPC>
<Quantity>1</Quantity>
<SalePrice>330</SalePrice>
<IndividualPrice>330</IndividualPrice>
<CreatedDate>2017-06-13T09:53:00.8548823Z</CreatedDate>
<Status>0</Status>
<ShippingCost>0</ShippingCost>
<TotalTax>3.3</TotalTax>
<AppliedTaxes>
<LineTax>
<TaxId>0</TaxId>
<Amount>0</Amount>
<CreatedDate>0001-01-01T00:00:00</CreatedDate>
</LineTax>
</AppliedTaxes>
<AppliedDiscounts/>
<ItemCondition>SellableAsNew</ItemCondition>
<ReturnReason>PoorQuality</ReturnReason>
</TransactionLine>
<TransactionLine>
<LineNumber>2</LineNumber>
<Name>This Was A Man</Name>
<ItemUPC>777221028297</ItemUPC>
<Quantity>1</Quantity>
<SalePrice>4.99</SalePrice>
<IndividualPrice>4.99</IndividualPrice>
<CreatedDate>2017-06-13T09:53:07.8263895Z</CreatedDate>
<Status>0</Status>
<ShippingCost>0</ShippingCost>
<TotalTax>0</TotalTax>
<AppliedTaxes/>
<AppliedDiscounts/>
<ItemCondition>SellableAsNew</ItemCondition>
<ReturnReason>PoorQuality</ReturnReason>
</TransactionLine>
<TransactionLine>
<LineNumber>3</LineNumber>
<Name>A Prisoner of Birth</Name>
<ItemUPC>4000111222302</ItemUPC>
<Quantity>1</Quantity>
<SalePrice>5.3</SalePrice>
<IndividualPrice>5.3</IndividualPrice>
<CreatedDate>2017-06-13T09:53:11.124866Z</CreatedDate>
<Status>0</Status>
<ShippingCost>0</ShippingCost>
<TotalTax>0</TotalTax>
<AppliedTaxes/>
<AppliedDiscounts/>
<ItemCondition>SellableAsNew</ItemCondition>
<ReturnReason>PoorQuality</ReturnReason>
</TransactionLine>
</itemlist>
<transactiontenders>1</transactiontenders>
<transactiontenders>2</transactiontenders>
<transactiontype>1</transactiontype>
<transdate>2017-06-13T09:53:29Z</transdate>
<transtime>09:53</transtime>
</Item>
</ArrayOfItem>'

然后

doc  <-  xmlParse(order.xml.string, asText = TRUE)
y <-
  xmlToDataFrame(nodes = getNodeSet(doc, "//TransactionLine"),
                 stringsAsFactors = FALSE)
nrow(y) # To get total number of order

numeric.cols <- c("Quantity",
                  "SalePrice",
                  "IndividualPrice",
                  "ShippingCost",
                  "TotalTax")

y[, numeric.cols] <-
  lapply(y[, numeric.cols], as.numeric)

colSums(y[(y$ItemCondition == "SellableAsNew" &
             y$ReturnReason == "PoorQuality"), numeric.cols])

Quantity       SalePrice IndividualPrice    ShippingCost        TotalTax
   4.00          670.29          670.29            0.00            6.60 

xmlToList方法:

我喜欢数据帧,就像任何人一样,但我通常不会发现xmlToDataFrame是一个很好的解决方案。我不认为这个XML内容现在确实具有严格的矩形形状。例如,即使在TransactionLine路径中,看起来税收和折扣路径也是嵌套的(不是平坦的)。即使当前格式适合于数据帧转换,它可能在将来发生变化,然后您需要从数据帧单元中解析数据结构。

也许考虑xmlToList?或者甚至将数据保留为XML并在XPath函数中应用xmlApply个表达式的所有逻辑。

order.xml <-
  xmlTreeParse(order.xml.string,
               asText = TRUE,
               useInternalNodes = TRUE)
orders <- xmlRoot(order.xml)
y <- xmlToList(orders)

my.totals <- sapply(y, function(one.item) {
  return(as.numeric(one.item$total))
})

total.total <- sum(my.totals)
print(total.total)

[1] 676.89