我想在R中读取一个带有encoding=utf-8
的XML文件(希伯来文中有文字)。
我知道Package XML,但我在xmlToDataFrame
找不到任何编码选项。
我试过了:
library(XML)
data <- xmlToDataFrame("G:/G_RBT/Alexey/DB/kupot.xml")
但我遇到了希伯来语的问题,我无法阅读。我也尝试过:
data <- xmlParse("G:/G_RBT/Alexey/DB/kupot.xml",encoding="UTF-8")
并且仍然编码无效。
答案 0 :(得分:1)
有时你需要一些手动肘部油脂:
library(XML)
library(httr)
# found this XML with hebrew
tmp <- GET("https://tiktickets.googlecode.com/svn-history/r102/trunk/war/ShowHalls.xml")
doc <- content(tmp, as="text", encoding="UTF-8")
doc <- substr(doc, 2, nchar(doc)) # skip encoding bits at the beginning
doc_x <- xmlParse(doc, encoding="UTF-8")
# do data frame conversion by hand
data.frame(name=xpathSApply(doc_x, "//ShowHall/name", xmlValue, encoding="UTF-8"),
address=xpathSApply(doc_x, "//ShowHall/address", xmlValue, encoding="UTF-8"),
phone1=xpathSApply(doc_x, "//ShowHall/phone1", xmlValue, encoding="UTF-8"),
longitude=xpathSApply(doc_x, "//ShowHall/longitude", xmlValue, encoding="UTF-8"),
latitude=xpathSApply(doc_x, "//ShowHall/latitude", xmlValue, encoding="UTF-8"))