我想要一个使用dplyr
的功能,看起来像下面的AddPercentColumns()
。
AddPercentColumns <- function(df, col) {
# Sorts and adds "Percent" and "Cumulative Percent" columns to a data.frame.
#
# Args:
# df: data frame
# col: column symbol
#
# Returns:
# Data frame sorted by "col" with new "Percent" and "Cumulative Percent" columns.
df %>%
arrange(desc(col)) %>%
mutate(Percent = col / sum(col) * 100) %>%
mutate(Cumulative = cumsum(Percent))
}
但是,我无法绕过如何解决NSE问题。我可能传入列名字符串并使用arrange_()
和mutate_()
,但我不知道如何处理desc()
,sum()
和{{ 1}}。
如何使用cumsum()
编写此函数?
答案 0 :(得分:2)
根据康拉德的建议,我发布了另一个解决方案。 :)
AddPercentColumns <- function(df, col) {
# Sorts data.frame and adds "Percent" and "Cumulative Percent" columns.
#
# Args:
# df: data frame
# col: unevaluated column symbol e.g. substitute(col)
#
# Returns:
# Data frame sorted by "col" with new "Percent" and "Cumulative Percent" columns.
df %>%
arrange_(bquote(desc(.(col)))) %>%
mutate_(Percent = bquote(.(col) / sum(.(col)) * 100)) %>%
mutate(Cumulative = cumsum(Percent))
}
绝对更干净,更可调试和可读。
答案 1 :(得分:0)
我发现sprintf()
比paste()
更容易阅读。下面的函数似乎调试很有趣,但它完成了工作。
AddPercentColumn <- function(df, col) {
# Sorts data.frame and adds "Percent" and "Cumulative Percent" columns.
#
# Args:
# df: data frame
# col: column name string
#
# Returns:
# Data frame sorted by "col" with new "Percent" and "Cumulative Percent" columns.
df %>%
arrange_(sprintf("desc(%s)", col)) %>%
mutate_(Percent = sprintf("%s / sum(%s) * 100", col, col)) %>%
mutate_(Cumulative = "cumsum(Percent)")
}
虽然不是超级干净......