Question

我有一个df，其中第2列及以上是美元金额，如$ 1004.23，（$ 1482.40），$ 2423.94等。类似于以下示例：

> df
  id   desc    price
1  0    apple   $1.00
2  1    banana  ($2.25)
3  2    grapes  $1.97

我想首先将括号中的数字转换为负数，然后删除美元符号的数字。

for(i in 2:ncol(df)){
    df[[i]] <- as.character(sub(")", "", sub("(", "-", df[[i]], fixed=TRUE), fixed=TRUE))
    df[[i]] <- as.numeric(gsub('[$,]', '', as.character(df[[i]])))
}

目前我的代码几乎完全符合我的要求。我不想要/需要做的一件事就是四舍五入。每当我运行我的代码时，它也会对数字进行舍入，使得上面的df变为：

> df
  id   desc    price
1  0    apple  1
2  1    banana -2
3  2    grapes 2

关于如何在没有数字舍入的情况下实现目标的任何建议？它与很多后来的计算混淆了。

Answer 1

另一种可能的解决方案，它建立在您自己的尝试之上，并考虑到您需要转换比示例中更多的列：

d[,-c(1:2)] <- lapply(d[,-c(1:2)], 
                      function(x) as.numeric(gsub('[$,]', '', sub(")", "", sub("(", "-", x, fixed=TRUE), fixed=TRUE))))

给出：

> d
  id   desc price price2
1  0  apple  1.00  -5.90
2  1 banana -2.25   2.39
3  2 grapes  1.97  -0.95

或使用for-loop：

for(i in 3:ncol(d)){
  d[[i]] <- as.numeric(gsub('[$,]', '', sub(")", "", sub("(", "-", d[[i]], fixed=TRUE), fixed=TRUE)))
}

或使用data.table包：

library(data.table)
cols <- names(d)[-c(1:2)]
setDT(d)[, (cols) := lapply(.SD, function(x) as.numeric(gsub('[$,]', '', sub(")", "", sub("(", "-", x, fixed=TRUE), fixed=TRUE)))),
         .SDcols = cols]

或使用dplyr包：

library(dplyr)
d %>% 
  mutate_all(funs(as.numeric(gsub('[$,]', '', sub(")", "", sub("(", "-", ., fixed=TRUE), fixed=TRUE)))), -c(1:2))

这些都会给你相同的结果。

使用过的数据：

d <- structure(list(id = 0:2, desc = c("apple", "banana", "grapes"), 
                    price = c("$1.00", "($2.25)", "$1.97"), 
                    price2 = c("($5.9)", "$2.39", "($0.95)")),
               .Names = c("id", "desc", "price", "price2"), class = "data.frame", row.names = c("1", "2", "3"))

Answer 2

for(i in 1:nrow(df)){
    df[i,3] <- as.character(sub(")", "", sub("(", "-", as.character(df[i,3]), fixed=TRUE), fixed=TRUE))
    df[i,3] <- as.numeric(gsub('[$,]', '', df[i,3]))
}

Answer 3

我可能更接近这个：

dat <- read.table(text = "id   desc    price
1  0    apple   $1.00
2  1    banana  ($2.25)
3  2    grapes  $1.97",sep = "",header = TRUE,stringsAsFactors = FALSE)

dat$neg <- ifelse(grepl("^\\(.+\\)$",dat$price),-1,1)
dat$price1 <- with(dat,as.numeric(gsub("[^0-9.]","",price)) * neg)

> dat
  id   desc   price neg price1
1  0  apple   $1.00   1   1.00
2  1 banana ($2.25)  -1  -2.25
3  2 grapes   $1.97   1   1.97

...如果您为多个列执行此操作，您可能不会每次都在数据框中存储+/-信息，但您会得到基本的想法。

Answer 4

这与Matt的答案类似，但它是矢量化的（在所需的行上没有循环）。它进一步结合了Procrastinatus Maximus的方法来处理多个列，如果这些值最初存储为因子，它也可以工作：

df1[3:ncol(df1)] <- apply(df1[3:ncol(df1)], 2, function(x) 
                         as.numeric(gsub("(", "-", gsub(")", "", gsub("$", "",
                         as.character(x), fixed=TRUE)), fixed=TRUE)))
#> df1
#  id   desc price price2
#1  0  apple  1.00  -5.90
#2  1 banana -2.25   2.39
#3  2 grapes  1.97  -0.95

数据

df1 <- structure(list(id = 0:2, desc = structure(1:3, .Label = c("apple", "banana", "grapes"), class = "factor"), price = structure(c(1L, 3L, 2L), .Label = c("$1.00", "$1.97", "($2.25)"), class = "factor"), price2 = structure(c(3L, 2L, 1L), .Label = c("($0.95)", "$2.39", "($5.90"), class = "factor")), .Names = c("id", "desc", "price", "price2"), class = "data.frame", row.names = c("1", "2", "3"))

对于正数和负数，R将$ xxx.xx更改为xxx.xx但不舍入

4 个答案: