Question

我的数据框的值如下：

BrandName  Expense
Apple      1.8
Google     3.2
facebook   0.281
McDonald   0.719

我想清理这些费用值，使它们最终处于相同的比例（以十亿分之一）。对于ex，最终数据框应如下所示：

A <- function(x){
  if (grepl("M", x))
  {
    str_replace(x, "M", "")
    as.numeric(x)
    x <- x/1000
  }
  else if (grepl("B", x))
  {
    str_replace(x, "B", "")
    as.numeric(x)
  }
}
frame <- data.frame(frame[1], apply(frame[2],2, A))

$可以通过gsub简单地删除。这可以。但我后来面临问题。我正在应用一个函数A，它使用grepl检查值是否包含'M'，如果为true（条带'M'，转换为数值，除以1000）如果它返回false（条带'B'，转换为数值）

ConfigurableApplicationContext

但是最终结果中所有的费用值都是NA。在进一步的分析中，我注意到所有的值，它在elseif部分。我在应用功能中使用了grepl吗？如果是，我该如何解决呢。

或解决这个特殊问题的其他更好的解决方案？

Answer 1

根据您的需要，这是一个基础R解决方案，可能对您的问题更敏感：

df$ExpenseScaled <- as.numeric(gsub("[$MB]", "", df$Expense))
m.index          <- substr(df$Expense, nchar(df$Expense), nchar(df$Expense)) == 'M'
df$ExpenseScaled[m.index] <- df$ExpenseScaled[m.index] / 1000

 df
 BrandName Expense ExpenseScaled
1     Apple   $1.8B         1.800
2    Google   $3.2B         3.200
3  Facebook   $281M         0.281
4 McDonalds   $719M         0.719

第一行代码会移除美元符号和金额符号（B或M）以获取数字金额。接下来的两行代码根据您的规范有条件地将数百万个数字除以1000。

Answer 2

我们可以使用gsubfn执行此操作。我们使用$移除sub，然后使用1替换'B'，将'M'替换为* 1/1000和gsubfn，循环遍历vector 1}}并评估字符串。

library(gsubfn)
df1$Expense <-  unname(sapply(gsubfn("([A-Z])$", list(B=1, M=' * 1/1000'), 
          sub("[$]", "", df1$Expense)), function(x) eval(parse(text=x))))
df1
#   BrandName Expense
#1     Apple   1.810
#2    Google   3.210
#3  facebook   0.281
#4  McDonald   0.719

或者base R选项是提取数字子串（'val'），末尾的子串（'nm1'），将'val'转换为数字并乘以1,1 / 1000基于子字符串'nm1'与创建的键/值'字符串匹配。

val <- gsub("[^0-9.]+", "", df1$Expense)
nm1 <- sub(".*(.)$", "\\1", df1$Expense)
df1$Expense <-  as.numeric(val)*setNames(c(1, 1/1000), c("B", "M"))[nm1]
df1
#  BrandName Expense
#1     Apple   1.800
#2    Google   3.200
#3  facebook   0.281
#4  McDonald   0.719

注意：如果两种方法中都有Trillions，Thousands等，也应该扩展这一点，即第一种方法在list(...)内更改，第二种方法是通过创建更多键/值组来更改在setNames(c(1, ...), c("B", "M", ...))

另一个选项是来自parse_number readr

的dplyr

library(dplyr)
library(readr)
df1 %>% 
   mutate(Expense = parse_number(Expense)/c(1, 1000)[grepl("M", Expense)+1])
#   BrandName Expense
#1     Apple   1.800
#2    Google   3.200
#3  facebook   0.281
#4  McDonald   0.719

数据

df1 <- structure(list(BrandName = c("Apple", "Google", "facebook", "McDonald"
), Expense = c("$1.8B", "$3.2B", "$281M", "$719M")), .Names = c("BrandName", 
"Expense"), class = "data.frame", row.names = c(NA, -4L))

不能在apply函数中使用grepl吗？

2 个答案:

数据