我正在努力将我的函数应用到数据框的多个列中。我有以下公式
test <- function(y, x, data) {
best.score <- min(data[data$income == y, ][[x]], na.rm=T)
(max(data[[x]], na.rm = TRUE)- data[[x]])/ (max(data[[x]], na.rm= TRUE) - best.score)
}
#When I apply it to one column for a data frame like this, it works and gives me a data frame with the values I need
result= data.frame(test( y= "middle income", x="risk", data = sub))
该公式的作用是创建相对得分:
relative_socore1 =(表现最佳的国家/地区得分(总体)]-国家/地区得分(x))/ 表现最佳的国家(整体)scroe]-(每组国家/地区的最高得分(例如,仅针对“中等收入”组))
我想将此公式应用于数据框中的许多列,并仅使用分数结果创建一个新的数据框。我尝试了以下方法:
integrated_score <- sub %>%
mutate_each(is.double, test1(y= "middle income", data = sub))
#但我收到“错误:缺少参数“ x”,没有默认值“,但我的x应该是数据框中的列
I also tried this:
integrated_score <- sub %>%
mutate_if(is.numeric, funs(test1(data= sub, y="middle income")))
#and this
vars <- (names(sub)[6:30])
vars <- setNames(vars, paste0(vars, "_test"))
tdata <- sub %>%
mutate_each_(funs(test1(data= sub, y="middle income")), vars)
但是我一直得到相同的结果“错误:缺少参数“ x”,没有默认值”。任何想法如何将此功能应用于数据框的多个列,或如何指定x假定为数据框的数字列。
My data structure is like this
sub <- data.frame(country= c("blue", "green", "red", "pink", "grey", "black", "rose", "darkblue", "golden", "silver") , group = c("low income","middle income", "middle income", "middle income", "high income", "middle income", "high income", "middle income", "middle income", "low income" ), risk = c(18, 22, 21, 22, 45, 18, 22, 21, 22, 45), earthquakes= c(10, 20, 21, 92, 40, 18, 20, 21, 20, 45), floods = c(10, 20, 30, 40, 50, 60, 70, 80, 95, 95))
答案 0 :(得分:0)
我认为要在dplyr
框架中运行此功能,您需要对函数进行一些不同的编写(请参见下文)。但是,您可以使用sapply()
df <- tibble::tibble(
income = sample(c("low income", "middle income", "upper income"), 50, replace=TRUE),
risk = runif(50, 0, 100),
risk2 = runif(50, 0, 100)
)
test <- function(y, x, data) {
best.score <- min(data[data$income == y, ][[x]], na.rm=T)
(max(data[[x]], na.rm = TRUE)- data[[x]])/ (max(data[[x]], na.rm= TRUE) - best.score)
}
sapply(names(df)[which(sapply(df, is.numeric))],
function(z)test("middle income", z, df))
# risk risk2
# [1,] 0.99126502 0.76377083
# [2,] 0.24662049 0.50686486
# [3,] 0.35321178 0.75278994
# [4,] 0.08891155 0.01049953
# [5,] 0.61905034 0.64051839
# [6,] 0.03936396 1.00000000
# [7,] 0.43424486 0.72034776
# [8,] 0.68451288 0.50013829
# [9,] 0.42632501 0.84215180
# [10,] 0.83093289 0.86745020
# [11,] 1.00000000 0.60915298
# [12,] 0.62479411 0.59268033
# [13,] 0.98408976 0.18052309
# [14,] 0.01744632 0.61018891
# [15,] 0.18876947 0.71309598
# [16,] 0.92650568 0.07473100
# [17,] 0.03610399 0.86314280
# [18,] 0.55538600 0.49841265
# [19,] 0.24854898 0.01362557
# [20,] 0.68921235 0.22034624
# [21,] 0.61214643 0.21680941
# [22,] 0.14955183 0.81706890
出于完整性考虑,如果您想使用dplyr
来做到这一点,则可以使用以下函数来实现:
test1 <- function(data, y){
data %>% mutate(across(where(is.numeric), function(x){
(max(x, na.rm=TRUE) - x)/
(max(x, na.rm=TRUE) - min(x[which(.$income == y)], na.rm=TRUE))}))
}
test1(df, "middle income")
# # A tibble: 50 x 3
# income risk risk2
# <chr> <dbl> <dbl>
# 1 low income 0.991 0.764
# 2 middle income 0.247 0.507
# 3 low income 0.353 0.753
# 4 upper income 0.0889 0.0105
# 5 low income 0.619 0.641
# 6 middle income 0.0394 1
# 7 upper income 0.434 0.720
# 8 middle income 0.685 0.500
# 9 upper income 0.426 0.842
# 10 upper income 0.831 0.867
# # … with 40 more rows