R - 将文本中的分数转换为数字

时间:2015-02-22 21:13:13

标签: r string

我正在尝试将“9¼”转换为“9.25”,但似乎无法正确读取该分数。

以下是我正在使用的数据:

library(XML)

url <- paste("http://mockdraftable.com/players/2014/", sep = "")  
combine <- readHTMLTable(url,which=1, header=FALSE, stringsAsFactors=F)

names(combine) <- c("Name", "Pos", "Hght", "Wght", "Arms", "Hands",
                    "Dash40yd", "Dash20yd", "Dash10yd", "Bench", "Vert", "Broad", 
                    "Cone3", "ShortShuttle20")

例如,第一行中的Hands列是'9¼'',我将如何组合$ Hands变为9.25?对于所有其他分数1/8 - 7/8也是如此。

任何帮助将不胜感激。

2 个答案:

答案 0 :(得分:7)

使用特殊的返回函数读取XML时,可以尝试将unicode编码直接转换为ASCII:

library(stringi)
readHTMLTable(url,which=1, header=FALSE, stringsAsFactors=F,elFun=function(node) {
        val = xmlValue(node); stri_trans_general(val,"latin-ascii")})

然后您可以使用@ Metrics&#39;建议将其转换为数字。

你可以做,例如,使用@G。来自this post的格洛腾迪克函数清理Arms数据:

library(XML)
library(stringi)
library(gsubfn)
#the calc function is by @G. Grothendieck
calc <- function(s) {
        x <- c(if (length(s) == 2) 0, as.numeric(s), 0:1)
        x[1] + x[2] / x[3]
}

url <- paste("http://mockdraftable.com/players/2014/", sep = "")  

combine<-readHTMLTable(url,which=1, header=FALSE, stringsAsFactors=F,elFun=function(node) {
        val = xmlValue(node); stri_trans_general(val,"latin-ascii")})

names(combine) <- c("Name", "Pos", "Hght", "Wght", "Arms", "Hands",
                    "Dash40yd", "Dash20yd", "Dash10yd", "Bench", "Vert", "Broad", 
                    "Cone3", "ShortShuttle20")

sapply(strapplyc(gsub('\"',"",combine$Arms), "\\d+"), calc)

#[1] 30.000 31.500 30.000 31.750 31.875 29.875 31.000 31.000 30.250 33.000 32.500 31.625 32.875

根据您的计算机可能存在一些编码问题(请参阅注释)

答案 1 :(得分:1)

与替代品相比,我不认为这是聪明或有效的,但是这使用gsub来代替&#34;符号并将每个分数转换为小数,然后转换为数字:

#data (I've not downloaded XML for this, so maybe the encoding will make a difference?)
combine = data.frame(Hands = c('1"','1⅛"','1¼"','1⅜"','1½"','1⅝"','1¾"','1⅞"'))

#remove the "
combine$Hands = gsub('"', '', combine$Hands)

#replace each fraction with its decimal form
combine$Hands = gsub("⅛", ".125", combine$Hands)
combine$Hands = gsub("¼", ".25", combine$Hands)
combine$Hands = gsub("⅜", ".375", combine$Hands)
combine$Hands = gsub("½", ".5", combine$Hands)
combine$Hands = gsub("⅝", ".625", combine$Hands)
combine$Hands = gsub("¾", ".75", combine$Hands)
combine$Hands = gsub("⅞", ".875", combine$Hands)


combine$Hands <- as.numeric(combine$Hands)