我有一个df,其中第2列及以上是美元金额,如$ 1004.23,($ 1482.40),$ 2423.94等。类似于以下示例:
> df
id desc price
1 0 apple $1.00
2 1 banana ($2.25)
3 2 grapes $1.97
我想首先将括号中的数字转换为负数,然后删除美元符号的数字。
for(i in 2:ncol(df)){
df[[i]] <- as.character(sub(")", "", sub("(", "-", df[[i]], fixed=TRUE), fixed=TRUE))
df[[i]] <- as.numeric(gsub('[$,]', '', as.character(df[[i]])))
}
目前我的代码几乎完全符合我的要求。我不想要/需要做的一件事就是四舍五入。每当我运行我的代码时,它也会对数字进行舍入,使得上面的df变为:
> df
id desc price
1 0 apple 1
2 1 banana -2
3 2 grapes 2
关于如何在没有数字舍入的情况下实现目标的任何建议?它与很多后来的计算混淆了。
答案 0 :(得分:2)
另一种可能的解决方案,它建立在您自己的尝试之上,并考虑到您需要转换比示例中更多的列:
d[,-c(1:2)] <- lapply(d[,-c(1:2)],
function(x) as.numeric(gsub('[$,]', '', sub(")", "", sub("(", "-", x, fixed=TRUE), fixed=TRUE))))
给出:
> d
id desc price price2
1 0 apple 1.00 -5.90
2 1 banana -2.25 2.39
3 2 grapes 1.97 -0.95
或使用for-loop:
for(i in 3:ncol(d)){
d[[i]] <- as.numeric(gsub('[$,]', '', sub(")", "", sub("(", "-", d[[i]], fixed=TRUE), fixed=TRUE)))
}
或使用data.table
包:
library(data.table)
cols <- names(d)[-c(1:2)]
setDT(d)[, (cols) := lapply(.SD, function(x) as.numeric(gsub('[$,]', '', sub(")", "", sub("(", "-", x, fixed=TRUE), fixed=TRUE)))),
.SDcols = cols]
或使用dplyr
包:
library(dplyr)
d %>%
mutate_all(funs(as.numeric(gsub('[$,]', '', sub(")", "", sub("(", "-", ., fixed=TRUE), fixed=TRUE)))), -c(1:2))
这些都会给你相同的结果。
使用过的数据:
d <- structure(list(id = 0:2, desc = c("apple", "banana", "grapes"),
price = c("$1.00", "($2.25)", "$1.97"),
price2 = c("($5.9)", "$2.39", "($0.95)")),
.Names = c("id", "desc", "price", "price2"), class = "data.frame", row.names = c("1", "2", "3"))
答案 1 :(得分:1)
for(i in 1:nrow(df)){
df[i,3] <- as.character(sub(")", "", sub("(", "-", as.character(df[i,3]), fixed=TRUE), fixed=TRUE))
df[i,3] <- as.numeric(gsub('[$,]', '', df[i,3]))
}
答案 2 :(得分:0)
我可能更接近这个:
dat <- read.table(text = "id desc price
1 0 apple $1.00
2 1 banana ($2.25)
3 2 grapes $1.97",sep = "",header = TRUE,stringsAsFactors = FALSE)
dat$neg <- ifelse(grepl("^\\(.+\\)$",dat$price),-1,1)
dat$price1 <- with(dat,as.numeric(gsub("[^0-9.]","",price)) * neg)
> dat
id desc price neg price1
1 0 apple $1.00 1 1.00
2 1 banana ($2.25) -1 -2.25
3 2 grapes $1.97 1 1.97
...如果您为多个列执行此操作,您可能不会每次都在数据框中存储+/-信息,但您会得到基本的想法。
答案 3 :(得分:0)
这与Matt的答案类似,但它是矢量化的(在所需的行上没有循环)。它进一步结合了Procrastinatus Maximus的方法来处理多个列,如果这些值最初存储为因子,它也可以工作:
df1[3:ncol(df1)] <- apply(df1[3:ncol(df1)], 2, function(x)
as.numeric(gsub("(", "-", gsub(")", "", gsub("$", "",
as.character(x), fixed=TRUE)), fixed=TRUE)))
#> df1
# id desc price price2
#1 0 apple 1.00 -5.90
#2 1 banana -2.25 2.39
#3 2 grapes 1.97 -0.95
数据强>
df1 <- structure(list(id = 0:2, desc = structure(1:3, .Label = c("apple",
"banana", "grapes"), class = "factor"), price = structure(c(1L, 3L, 2L),
.Label = c("$1.00", "$1.97", "($2.25)"), class = "factor"),
price2 = structure(c(3L, 2L, 1L),
.Label = c("($0.95)", "$2.39", "($5.90"),
class = "factor")), .Names = c("id", "desc", "price", "price2"),
class = "data.frame", row.names = c("1", "2", "3"))