在dplyr中处理动态变量名称

时间:2018-06-21 18:26:08

标签: r dplyr

由于我先前的可复制示例中存在非常严重的错误,我正在重新发布该问题。

我的数据如下:

<template>
  <div id="app">
    <div class="wrapper">
      <Sidebar/>
      <div class="container-fluid">
        <TopNav/>
        <MobNav/>
        <div class="container-fluid">
          <PageHead/>
          <router-view></router-view>
        </div>
      </div>
    </div>
  </div>
</template>

<script>
import Sidebar from '@/components/Sidebar'
import TopNav from '@/components/TopNav'
import MobNav from '@/components/MobNav'
import PageHead from '@/components/PageHead'

export default {
  name: 'App',
  components: {
    Sidebar,
    TopNav,
    MobNav,
    PageHead
  }
}
</script>

系统将要求我从数据中六个变量中的任何一个(也只有一个)返回排名最高的3个值。我为此编写的函数是:

<template>
  <div class="add-load">
    <div class="content-container container-slim">
        <progress-steps/>
        <router-link to="#stops">Stops</router-link>
        <router-view></router-view>
    </div>
  </div>

</template>

<script>
import ProgressSteps from '@/components/ProgressSteps'

export default {
  name: 'AddLoad',
  component: ProgressSteps
}
</script>

但是当我运行set.seed(123) X_foo <- runif(6, 0, 1) X_bar <- runif(6, 0, 100) Y_foo <- runif(6, 0, 1) Y_bar <- runif(6, 0, 100) Z_foo <- runif(6, 0, 1) Z_bar <- runif(6, 0, 100) df <- data.frame(X_foo, X_bar, Y_foo, Y_bar, Z_foo, Z_bar) df X_foo X_bar Y_foo Y_bar Z_foo Z_bar 1 0.2875775 52.81055 0.67757064 32.79207 0.6557058 96.302423 2 0.7883051 89.24190 0.57263340 95.45036 0.7085305 90.229905 3 0.4089769 55.14350 0.10292468 88.95393 0.5440660 69.070528 4 0.8830174 45.66147 0.89982497 69.28034 0.5941420 79.546742 5 0.9404673 95.68333 0.24608773 64.05068 0.2891597 2.461368 6 0.0455565 45.33342 0.04205953 99.42698 0.1471136 47.779597 时,它坏了。我已经指出了中断发生的位置:我无法弄清楚语句aRankingFunction <- function(aMetric1 = "X", aMetric2 = "foo") { # list of names that the function will accept good_metric1 <- c("X", "Y", "Z") good_metric2 <- c("foo", "bar") # use an if statement, so if user enters a bad name they get an error back if((aMetric1 %in% good_metric1) & (aMetric2 %in% good_metric2)) { thePull <- df %>% # Select statement should pull exactly one variable (by default, X_foo) select(contains(aMetric1)) %>% select(contains(aMetric2)) } else { return("Error") } theOutput <- thePull %>% # Create a new variable with the ranks of the variable pulled mutate(Rank = min_rank()) %>% # This is where the function breaks # Sort the ranks arrange(desc(Rank)) %>% # Filter for ranks 1,2,3 filter(Rank <= 3) return(theOutput) } 中的*应该是什么。该语句将排名所选择的六个变量之一,但是直到运行时我才知道哪个。

如何动态告诉aRankingFunction()语句“使用已选择的变量名”?

2 个答案:

答案 0 :(得分:3)

仅关注需要工作的部分,您需要将具有的字符串转换为符号,然后使用bang-bang !!运算符将其注入到dplyr调用中

...
rankvar <- as.symbol(names(thePull))
theOutput <- thePull %>%
  # Create a new variable with the ranks of the variable pulled
  mutate(Rank = min_rank(!!rankvar)) %>%
...

在这种只有一列的特殊情况下的另一种选择是

...
theOutput <- thePull %>%
  # Create a new variable with the ranks of the variable pulled
  mutate_all(funs(Rank = min_rank)) %>%
...

答案 1 :(得分:1)

您可以提交thePull作为min_rank()的参数

aRankingFunction <- function(aMetric1 = "X", aMetric2 = "foo") {
  # list of names that the function will accept
  good_metric1 <- c("X", "Y", "Z")
  good_metric2 <- c("foo", "bar")
  # use an if statement, so if user enters a bad name they get an error back 
  if((aMetric1 %in% good_metric1) & (aMetric2 %in% good_metric2)) {
    thePull <- df %>%
      # Select statement should pull exactly one variable (by default, X_foo)
      select(contains(aMetric1)) %>%
      select(contains(aMetric2))
  } else {
    return("Error")
  }
  theOutput <- df %>%
    # Create a new variable with the ranks of the variable pulled
    mutate(Rank = min_rank(thePull)) %>% # This is where the function breaks
    # Sort the ranks
    arrange(desc(Rank)) %>%
    # Filter for ranks 1,2,3
    filter(Rank <= 3)
  return(theOutput)
}

> aRankingFunction()
      X_foo    X_bar      Y_foo    Y_bar     Z_foo    Z_bar Rank
1 0.4089769 55.14350 0.10292468 88.95393 0.5440660 69.07053    3
2 0.2875775 52.81055 0.67757064 32.79207 0.6557058 96.30242    2
3 0.0455565 45.33342 0.04205953 99.42698 0.1471136 47.77960    1