Question

我想要一个使用dplyr的功能，看起来像下面的AddPercentColumns()。

AddPercentColumns <- function(df, col) {
    # Sorts and adds "Percent" and "Cumulative Percent" columns to a data.frame.
    #
    # Args:
    #   df: data frame
    #   col: column symbol
    #
    # Returns:
    #   Data frame sorted by "col" with new "Percent" and "Cumulative Percent" columns.

    df %>%
        arrange(desc(col)) %>%
        mutate(Percent = col / sum(col) * 100) %>% 
        mutate(Cumulative = cumsum(Percent))
}

但是，我无法绕过如何解决NSE问题。我可能传入列名字符串并使用arrange_()和mutate_()，但我不知道如何处理desc()，sum()和{{ 1}}。

如何使用cumsum()编写此函数？

Answer 1

根据康拉德的建议，我发布了另一个解决方案。：）

AddPercentColumns <- function(df, col) {
    # Sorts data.frame and adds "Percent" and "Cumulative Percent" columns.
    #
    # Args:
    #   df: data frame
    #   col: unevaluated column symbol e.g. substitute(col)
    #
    # Returns:
    #   Data frame sorted by "col" with new "Percent" and "Cumulative Percent" columns.

    df %>%
        arrange_(bquote(desc(.(col)))) %>%
        mutate_(Percent = bquote(.(col) / sum(.(col)) * 100)) %>% 
        mutate(Cumulative = cumsum(Percent))
}

绝对更干净，更可调试和可读。

Answer 2

我发现sprintf()比paste()更容易阅读。下面的函数似乎调试很有趣，但它完成了工作。

AddPercentColumn <- function(df, col) {
    # Sorts data.frame and adds "Percent" and "Cumulative Percent" columns.
    #
    # Args:
    #   df: data frame
    #   col: column name string
    #
    # Returns:
    #   Data frame sorted by "col" with new "Percent" and "Cumulative Percent" columns.

    df %>%
        arrange_(sprintf("desc(%s)", col)) %>%
        mutate_(Percent = sprintf("%s / sum(%s) * 100", col, col)) %>% 
        mutate_(Cumulative = "cumsum(Percent)")
}

虽然不是超级干净......

如何将列传递给arrange（）和mutate（）

2 个答案: