在R中使用xpathSApply来计算平均值

时间:2015-06-26 08:59:27

标签: xml r

我在应用xpathSApply来计算温度均值时遇到了困难。可以从此处获取XML http://www.yr.no/place/Malaysia/Kuala_Lumpur/Kuala_Lumpur/forecast_hour_by_hour.xml

我的R代码:

library(XML)
fileURL<-"http://www.yr.no/place/Malaysia/Kuala_Lumpur/Kuala_Lumpur/forecast_hour_by_hour.xml"
doc <- xmlTreeParse(fileURL, useInternal=TRUE)
rootNode <- xmlRoot(doc)
xmlName(rootNode)
mean(xpathSApply(rootNode, "//temperature", xmlValue))

XML如下所示

<weatherdata>
<location>
<name>Kuala Lumpur</name>
<type>Capital</type>
<country>Malaysia</country>
<timezone id="Asia/Kuala_Lumpur" utcoffsetMinutes="480"/>
<location altitude="56" latitude="3.1412" longitude="101.68653" geobase="geonames" geobaseid="1735161"/>
</location>
<credit>
<!--
In order to use the free weather data from yr no, you HAVE to display 
the following text clearly visible on your web page. The text should be a 
link to the specified URL.
-->
<!--
Please read more about our conditions and guidelines at http://om.yr.no/verdata/  English explanation at http://om.yr.no/verdata/free-weather-data/
-->
<link text="Weather forecast from yr.no, delivered by the Norwegian Meteorological Institute and the NRK" url="http://www.yr.no/place/Malaysia/Kuala_Lumpur/Kuala_Lumpur/"/>
</credit>
<links>
<link id="xmlSource" url="http://www.yr.no/place/Malaysia/Kuala_Lumpur/Kuala_Lumpur/forecast.xml"/>
<link id="xmlSourceHourByHour" url="http://www.yr.no/place/Malaysia/Kuala_Lumpur/Kuala_Lumpur/forecast_hour_by_hour.xml"/>
<link id="overview" url="http://www.yr.no/place/Malaysia/Kuala_Lumpur/Kuala_Lumpur/"/>
<link id="hourByHour" url="http://www.yr.no/place/Malaysia/Kuala_Lumpur/Kuala_Lumpur/hour_by_hour"/>
<link id="longTermForecast" url="http://www.yr.no/place/Malaysia/Kuala_Lumpur/Kuala_Lumpur/long"/>
</links>
<meta>
<lastupdate>2015-06-26T15:40:08</lastupdate>
<nextupdate>2015-06-27T04:00:00</nextupdate>
</meta>
<sun rise="2015-06-26T07:06:55" set="2015-06-26T19:25:04"/>
<forecast>
<tabular>
<time from="2015-06-26T17:00:00" to="2015-06-26T20:00:00">
<!--
Valid from 2015-06-26T17:00:00 to 2015-06-26T20:00:00 
-->
<symbol number="1" numberEx="1" name="Clear sky" var="01d"/>
<precipitation value="0"/>
<!--  Valid at 2015-06-26T17:00:00  -->
<windDirection deg="163.0" code="SSE" name="South-southeast"/>
<windSpeed mps="2.9" name="Light breeze"/>
<temperature unit="celsius" value="31"/>
<pressure unit="hPa" value="1008.1"/>
</time>
<time from="2015-06-26T20:00:00" to="2015-06-26T23:00:00">
<!--
Valid from 2015-06-26T20:00:00 to 2015-06-26T23:00:00 
-->
<symbol number="1" numberEx="1" name="Clear sky" var="mf/01n.31"/>
<precipitation value="0"/>
<!--  Valid at 2015-06-26T20:00:00  -->
<windDirection deg="143.3" code="SE" name="Southeast"/>
<windSpeed mps="1.2" name="Light air"/>
<temperature unit="celsius" value="29"/>
<pressure unit="hPa" value="1009.4"/>
</time>
</time>
</tabular>
</forecast>
</weatherdata>'

我在这里做对了吗?或者我错了?如果这是一个重复的问题,我很抱歉。

2 个答案:

答案 0 :(得分:1)

You have two or three issues:

  1. Function xPathSApply expects XML document as a first argument. Use xpathSApply(doc, ...) instead of xpathSApply(rootNode, ...)

  2. The temperature value is in the attribute of the element. You can get it with xpath expression (element/@attribute):

    temp <- xpathSApply(doc, "//temperature/@value", as.numeric)
    

    or using xmlGetAttr function:

    temp <- as.numeric(xpathSApply(doc, "//temperature", xmlGetAttr, "value"))
    
  3. Note the is.numeric call in both alternatives. You have to use a numeric vector with the mean function.

答案 1 :(得分:1)

It is working this way:

library(XML)
fileURL<-"http://www.yr.no/place/Malaysia/Kuala_Lumpur/Kuala_Lumpur/forecast_hour_by_hour.xml"
doc <- xmlTreeParse(fileURL, useInternal=TRUE)
rootNode <- xmlRoot(doc)
xmlName(rootNode)
mean(xpathSApply(doc, "//temperature/@value", as.numeric))

result is as below:

[1] "weatherdata"
[1] 27.6875