如何在R中读取XML文件(在编码utf-8中)?

时间:2014-11-12 05:29:41

标签: xml r encoding

我想在R中读取一个带有encoding=utf-8的XML文件(希伯来文中有文字)。

我知道Package XML,但我在xmlToDataFrame找不到任何编码选项。

我试过了:

library(XML)
data <- xmlToDataFrame("G:/G_RBT/Alexey/DB/kupot.xml")

但我遇到了希伯来语的问题,我无法阅读。我也尝试过:

data <- xmlParse("G:/G_RBT/Alexey/DB/kupot.xml",encoding="UTF-8")

并且仍然编码无效。

1 个答案:

答案 0 :(得分:1)

有时你需要一些手动肘部油脂:

library(XML)
library(httr)

# found this XML with hebrew
tmp <- GET("https://tiktickets.googlecode.com/svn-history/r102/trunk/war/ShowHalls.xml")
doc <- content(tmp, as="text", encoding="UTF-8")
doc <- substr(doc, 2, nchar(doc)) # skip encoding bits at the beginning

doc_x <- xmlParse(doc, encoding="UTF-8")

# do data frame conversion by hand

data.frame(name=xpathSApply(doc_x, "//ShowHall/name", xmlValue, encoding="UTF-8"),
           address=xpathSApply(doc_x, "//ShowHall/address", xmlValue, encoding="UTF-8"),
           phone1=xpathSApply(doc_x, "//ShowHall/phone1", xmlValue, encoding="UTF-8"),
           longitude=xpathSApply(doc_x, "//ShowHall/longitude", xmlValue, encoding="UTF-8"),
           latitude=xpathSApply(doc_x, "//ShowHall/latitude", xmlValue, encoding="UTF-8"))