将公式应用于数据框中的多列

时间:2020-10-27 22:43:32

标签: r function

我正在努力将我的函数应用到数据框的多个列中。我有以下公式

test <- function(y, x,  data) {
  best.score <- min(data[data$income == y, ][[x]], na.rm=T)
  (max(data[[x]], na.rm = TRUE)- data[[x]])/ (max(data[[x]], na.rm= TRUE) - best.score)
}

#When I apply it to one column for a data frame like this, it works and gives me a data frame with the values I need

result= data.frame(test( y= "middle income", x="risk", data = sub))

该公式的作用是创建相对得分:

relative_socore1 =(表现最佳的国家/地区得分(总体)]-国家/地区得分(x))/ 表现最佳的国家(整体)scroe]-(每组国家/地区的最高得分(例如,仅针对“中等收入”组))

我想将此公式应用于数据框中的许多列,并仅使用分数结果创建一个新的数据框。我尝试了以下方法:

integrated_score <- sub %>% 
mutate_each(is.double, test1(y= "middle income", data = sub))

#但我收到“错误:缺少参数“ x”,没有默认值“,但我的x应该是数据框中的列

I also tried this:

integrated_score <- sub %>% 
  mutate_if(is.numeric, funs(test1(data= sub, y="middle income")))

#and  this
vars <- (names(sub)[6:30])
vars <- setNames(vars, paste0(vars, "_test"))
tdata <- sub %>% 
  mutate_each_(funs(test1(data= sub, y="middle income")), vars)

但是我一直得到相同的结果“错误:缺少参数“ x”,没有默认值”。任何想法如何将此功能应用于数据框的多个列,或如何指定x假定为数据框的数字列。

My data structure is like this 
sub <- data.frame(country= c("blue", "green", "red", "pink", "grey", "black", "rose", "darkblue", "golden", "silver") , group = c("low income","middle income", "middle income", "middle income", "high income", "middle income",  "high income", "middle income", "middle income", "low income" ), risk = c(18, 22, 21, 22, 45, 18, 22, 21, 22, 45), earthquakes= c(10, 20, 21, 92, 40, 18, 20, 21, 20, 45), floods = c(10, 20, 30, 40, 50, 60, 70, 80, 95, 95)) 

1 个答案:

答案 0 :(得分:0)

我认为要在dplyr框架中运行此功能,您需要对函数进行一些不同的编写(请参见下文)。但是,您可以使用sapply()

df <- tibble::tibble(
  income = sample(c("low income", "middle income", "upper income"), 50, replace=TRUE), 
  risk = runif(50, 0, 100), 
  risk2 = runif(50, 0, 100)
)


test <- function(y, x,  data) {
  best.score <- min(data[data$income == y, ][[x]], na.rm=T)
  (max(data[[x]], na.rm = TRUE)- data[[x]])/ (max(data[[x]], na.rm= TRUE) - best.score)
}

sapply(names(df)[which(sapply(df, is.numeric))], 
       function(z)test("middle income", z, df))
#            risk      risk2
# [1,] 0.99126502 0.76377083
# [2,] 0.24662049 0.50686486
# [3,] 0.35321178 0.75278994
# [4,] 0.08891155 0.01049953
# [5,] 0.61905034 0.64051839
# [6,] 0.03936396 1.00000000
# [7,] 0.43424486 0.72034776
# [8,] 0.68451288 0.50013829
# [9,] 0.42632501 0.84215180
# [10,] 0.83093289 0.86745020
# [11,] 1.00000000 0.60915298
# [12,] 0.62479411 0.59268033
# [13,] 0.98408976 0.18052309
# [14,] 0.01744632 0.61018891
# [15,] 0.18876947 0.71309598
# [16,] 0.92650568 0.07473100
# [17,] 0.03610399 0.86314280
# [18,] 0.55538600 0.49841265
# [19,] 0.24854898 0.01362557
# [20,] 0.68921235 0.22034624
# [21,] 0.61214643 0.21680941
# [22,] 0.14955183 0.81706890

出于完整性考虑,如果您想使用dplyr来做到这一点,则可以使用以下函数来实现:

test1 <- function(data, y){
  data %>% mutate(across(where(is.numeric), function(x){
    (max(x, na.rm=TRUE) - x)/
      (max(x, na.rm=TRUE) - min(x[which(.$income == y)], na.rm=TRUE))}))
}

test1(df, "middle income")
# # A tibble: 50 x 3
#   income          risk  risk2
#   <chr>          <dbl>  <dbl>
# 1 low income    0.991  0.764 
# 2 middle income 0.247  0.507 
# 3 low income    0.353  0.753 
# 4 upper income  0.0889 0.0105
# 5 low income    0.619  0.641 
# 6 middle income 0.0394 1     
# 7 upper income  0.434  0.720 
# 8 middle income 0.685  0.500 
# 9 upper income  0.426  0.842 
# 10 upper income  0.831  0.867 
# # … with 40 more rows