在Mac OS X - [R 2.13.1我试图导入具有用于千个分离器和逗号作为小数点的点的数据文件,以及尾随减去负值。
基本上,我正在尝试转换:
"A|324,80|1.324,80|35,80-"
到
V1 V2 V3 V4
1 A 324.80 1324.8 -35.80
现在,交互式地进行以下工作:
gsub("\\.","","1.324,80")
[1] "1324,80"
gsub("(.+)-$","-\\1", "35,80-")
[1] "-35,80"
并将它们组合在一起:
gsub("\\.", "", gsub("(.+)-$","-\\1","1.324,80-"))
[1] "-1324,80"
但是,我无法从read.data中删除千位分隔符:
setClass("num.with.commas")
setAs("character", "num.with.commas", function(from) as.numeric(gsub("\\.", "", sub("(.+)-$","-\\1",from))) )
mydata <- "A|324,80|1.324,80|35,80-"
mytable <- read.table(textConnection(mydata), header=FALSE, quote="", comment.char="", sep="|", dec=",", skip=0, fill=FALSE,strip.white=TRUE, colClasses=c("character","num.with.commas", "num.with.commas", "num.with.commas"))
Warning messages:
1: In asMethod(object) : NAs introduced by coercion
2: In asMethod(object) : NAs introduced by coercion
3: In asMethod(object) : NAs introduced by coercion
mytable
V1 V2 V3 V4
1 A NA NA NA
请注意,如果我从“\\”更改。对于“,”在函数中,事情看起来有点不同:
setAs("character", "num.with.commas", function(from) as.numeric(gsub(",", "", sub("(.+)-$","-\\1",from))) )
mytable <- read.table(textConnection(mydata), header=FALSE, quote="", comment.char="", sep="|", dec=",", skip=0, fill=FALSE,strip.white=TRUE, colClasses=c("character","num.with.commas", "num.with.commas", "num.with.commas"))
mytable
V1 V2 V3 V4
1 A 32480 1.3248 -3580
我认为问题是read.data with dec =“,”将传入的“,”转换为“。”在调用(来自“num.with.commas”)之前,输入字符串可以是例如“1.324.80”。
我想为( “1.123,80 - ”, “num.with.commas”)返回-1123.80和作为( “1.100.123,80”, “num.with.commas”)返回1100123.80 <。 / p>
如何让我的num.with.commas替换输入字符串中除最后小数点以外的所有?
更新:首先,我添加了负面预测,并在控制台中使用as():
setAs("character", "num.with.commas", function(from) as.numeric(gsub("(?!\\.\\d\\d$)\\.", "", gsub("(.+)-$","-\\1",from), perl=TRUE)) )
as("1.210.123.80-","num.with.commas")
[1] -1210124
as("10.123.80-","num.with.commas")
[1] -10123.8
as("10.123.80","num.with.commas")
[1] 10123.8
但是,read.table仍然存在同样的问题。在我的函数中添加一些print()s表明num.with.commas实际上得到了逗号,而不是重点。
所以我目前的解决方案是从“,”更换为“。”在num.with.commas。
setAs("character", "num.with.commas", function(from) as.numeric(gsub(",","\\.",gsub("(?!\\.\\d\\d$)\\.", "", gsub("(.+)-$","-\\1",from), perl=TRUE))) )
mytable <- read.table(textConnection(mydata), header=FALSE, quote="", comment.char="", sep="|", dec=",", skip=0, fill=FALSE,strip.white=TRUE, colClasses=c("character","num.with.commas", "num.with.commas", "num.with.commas"))
mytable
V1 V2 V3 V4
1 A 324.8 1101325 -35.8
答案 0 :(得分:4)
您应首先删除所有句点,然后在强制使用as.numeric()之前将逗号更改为小数点。您可以稍后控制如何使用选项打印小数点(OutDec =“,”)。我不认为R在内部使用逗号作为小数分隔符,即使在传统的语言环境中也是如此。
> tst <- c("A","324,80","1.324,80","35,80-")
>
> as.numeric( sub("\\,", ".", sub("(.+)-$","-\\1", gsub("\\.", "", tst)) ) )
[1] NA 324.8 1324.8 -35.8
Warning message:
NAs introduced by coercion
答案 1 :(得分:1)
这是一个包含正则表达式和替换的解决方案
mydata <- "A|324,80|1.324,80|35,80-"
# Split data
mydata2 <- strsplit(mydata,"|",fixed=TRUE)[[1]]
# Remove commas
mydata3 <- gsub(",","",mydata2,fixed=TRUE)
# Move negatives to front of string
mydata4 <- gsub("^(.+)-$","-\\1",mydata3)
# Convert to numeric
mydata.cleaned <- c(mydata4[1],as.numeric(mydata4[2:4]))