我想使用一个函数在数据框中的四列上重复一组过程。最终,我需要一个包含所有输出的长数据帧。这是我的数据框:
Printer.StartPrintJob().StartPage().DrawText("Title").DrawIcon("logo.png").EndPage().Print();
这是我的代码:
> sample_data
# A tibble: 10 x 7
REVENUEID AMOUNT YEAR REPORT_CODE PAYMENT_METHOD INBOUND_CHANNEL AMOUNT_CAT
<chr> <dbl> <chr> <chr> <chr> <chr> <fctr>
1 rev-24985629 30 FY18 S Check Mail [25,50)
2 rev-22812413 1 FY16 Q Other Canvassing [0.01,10)
3 rev-23508794 100 FY17 Q Credit card Web [100,250)
4 rev-23506121 300 FY17 S Credit card Mail [250,500)
5 rev-23550444 100 FY17 S Credit card Web [100,250)
6 rev-21508672 25 FY14 J Check Mail [25,50)
7 rev-24981769 500 FY18 S Credit card Web [500,1e+03)
8 rev-23503684 50 FY17 R Check Mail [50,75)
9 rev-24982087 25 FY18 R Check Mail [25,50)
10 rev-24979834 50 FY18 R Credit card Web [50,75)
以下是该代码的最终产品:
AMOUNT_CAT<- sample_data %>% group_by(AMOUNT_CAT,YEAR) %>% summarize(num=n(),total=sum(AMOUNT)) %>% rename(REPORT_VALUE=AMOUNT_CAT) %>% mutate(REPORT_CATEGORY="AMOUNT_CAT")
INBOUND_CHANNEL<- sample_data %>% group_by(INBOUND_CHANNEL,YEAR) %>% summarize(num=n(),total=sum(AMOUNT)) %>% rename(REPORT_VALUE=INBOUND_CHANNEL) %>% mutate(REPORT_CATEGORY="INBOUND_CHANNEL")
PAYMENT_METHOD<- sample_data %>% group_by(PAYMENT_METHOD,YEAR) %>% summarize(num=n(),total=sum(AMOUNT)) %>% rename(REPORT_VALUE=PAYMENT_METHOD) %>% mutate(REPORT_CATEGORY="PAYMENT_METHOD")
REPORT_CODE<- sample_data %>% group_by(REPORT_CODE,YEAR) %>% summarize(num=n(),total=sum(AMOUNT)) %>% rename(REPORT_VALUE=REPORT_CODE) %>% mutate(REPORT_CATEGORY="REPORT_CODE")
final_product<-bind_rows(REPORT_CODE,PAYMENT_METHOD,INBOUND_CHANNEL,AMOUNT_CAT)
这是我尝试压缩代码以使其更智能,更高效(它不起作用):
> final_product
# A tibble: 27 x 5
# Groups: REPORT_VALUE [16]
REPORT_CATEGORY REPORT_VALUE YEAR num total
<chr> <chr> <chr> <int> <dbl>
1 REPORT_CODE J FY14 1 25
2 REPORT_CODE Q FY16 1 1
3 REPORT_CODE Q FY17 1 100
4 REPORT_CODE R FY17 1 50
5 REPORT_CODE R FY18 2 75
6 REPORT_CODE S FY17 2 400
7 REPORT_CODE S FY18 2 530
8 PAYMENT_METHOD Check FY14 1 25
9 PAYMENT_METHOD Check FY17 1 50
10 PAYMENT_METHOD Check FY18 2 55
# ... with 17 more rows
有人可以告诉我如何写&#34;更聪明&#34;代码不经常重复?
谢谢!
答案 0 :(得分:2)
您需要将字符串解析为符号(rlang::sym
),并在group_by
和rename
中取消引用它们,如下所示。另外需要注意的是,cat.list
已经是字符串向量,因此无需在x
中的mutate
周围添加双引号:
library(dplyr)
library(rlang)
cat.list <- c("REPORT_CODE","PAYMENT_METHOD","INBOUND_CHANNEL","AMOUNT_CAT")
repeat_procs <- lapply(cat.list, function(x){
final_data <- sample_data %>%
group_by(!!sym(x), YEAR) %>%
summarize(num=n(),total=sum(AMOUNT)) %>%
rename(REPORT_VALUE=!!sym(x)) %>%
mutate(REPORT_CATEGORY=x)
}) %>%
bind_rows()
<强>结果:强>
> repeat_procs
# A tibble: 27 x 5
# Groups: REPORT_VALUE [16]
REPORT_VALUE YEAR num total REPORT_CATEGORY
<chr> <fctr> <int> <int> <chr>
1 J FY14 1 25 REPORT_CODE
2 Q FY16 1 1 REPORT_CODE
3 Q FY17 1 100 REPORT_CODE
4 R FY17 1 50 REPORT_CODE
5 R FY18 2 75 REPORT_CODE
6 S FY17 2 400 REPORT_CODE
7 S FY18 2 530 REPORT_CODE
8 Check FY14 1 25 PAYMENT_METHOD
9 Check FY17 1 50 PAYMENT_METHOD
10 Check FY18 2 55 PAYMENT_METHOD
# ... with 17 more rows
答案 1 :(得分:2)
更多&#34;更聪明&#34;代码,您需要在分组和汇总之前将数据转换为"tidy data"形式。
data_tidy <-
tidyr::gather(sample_data, key = "REPORT_CATEGORY", value = "REPORT_VALUE", !! cat.list)
data_tidy
#> REVENUEID AMOUNT YEAR REPORT_CATEGORY REPORT_VALUE
#> 1 rev-24985629 30 FY18 REPORT_CODE S
#> 2 rev-22812413 1 FY16 REPORT_CODE Q
#> 3 rev-23508794 100 FY17 REPORT_CODE Q
#> 4 rev-23506121 300 FY17 REPORT_CODE S
#> 5 rev-23550444 100 FY17 REPORT_CODE S
#> 6 rev-21508672 25 FY14 REPORT_CODE J
#> 7 rev-24981769 500 FY18 REPORT_CODE S
#> 8 rev-23503684 50 FY17 REPORT_CODE R
#> 9 rev-24982087 25 FY18 REPORT_CODE R
#> 10 rev-24979834 50 FY18 REPORT_CODE R
#> 11 rev-24985629 30 FY18 PAYMENT_METHOD Check
#> 12 rev-22812413 1 FY16 PAYMENT_METHOD Other
#> 13 rev-23508794 100 FY17 PAYMENT_METHOD Credit card
#> 14 rev-23506121 300 FY17 PAYMENT_METHOD Credit card
#> 15 rev-23550444 100 FY17 PAYMENT_METHOD Credit card
#> 16 rev-21508672 25 FY14 PAYMENT_METHOD Check
#> 17 rev-24981769 500 FY18 PAYMENT_METHOD Credit card
#> 18 rev-23503684 50 FY17 PAYMENT_METHOD Check
#> 19 rev-24982087 25 FY18 PAYMENT_METHOD Check
#> 20 rev-24979834 50 FY18 PAYMENT_METHOD Credit card
#> 21 rev-24985629 30 FY18 INBOUND_CHANNEL Mail
#> 22 rev-22812413 1 FY16 INBOUND_CHANNEL Canvassing
#> 23 rev-23508794 100 FY17 INBOUND_CHANNEL Web
#> 24 rev-23506121 300 FY17 INBOUND_CHANNEL Mail
#> 25 rev-23550444 100 FY17 INBOUND_CHANNEL Web
#> 26 rev-21508672 25 FY14 INBOUND_CHANNEL Mail
#> 27 rev-24981769 500 FY18 INBOUND_CHANNEL Web
#> 28 rev-23503684 50 FY17 INBOUND_CHANNEL Mail
#> 29 rev-24982087 25 FY18 INBOUND_CHANNEL Mail
#> 30 rev-24979834 50 FY18 INBOUND_CHANNEL Web
#> 31 rev-24985629 30 FY18 AMOUNT_CAT [25,50)
#> 32 rev-22812413 1 FY16 AMOUNT_CAT [0.01,10)
#> 33 rev-23508794 100 FY17 AMOUNT_CAT [100,250)
#> 34 rev-23506121 300 FY17 AMOUNT_CAT [250,500)
#> 35 rev-23550444 100 FY17 AMOUNT_CAT [100,250)
#> 36 rev-21508672 25 FY14 AMOUNT_CAT [25,50)
#> 37 rev-24981769 500 FY18 AMOUNT_CAT [500,1e+03)
#> 38 rev-23503684 50 FY17 AMOUNT_CAT [50,75)
#> 39 rev-24982087 25 FY18 AMOUNT_CAT [25,50)
#> 40 rev-24979834 50 FY18 AMOUNT_CAT [50,75)
data_tidy %>%
group_by(REPORT_CATEGORY, REPORT_VALUE, YEAR) %>%
summarise(num = n(), total = sum(AMOUNT)) %>%
ungroup()
#> # A tibble: 27 x 5
#> REPORT_CATEGORY REPORT_VALUE YEAR num total
#> <chr> <chr> <chr> <int> <int>
#> 1 AMOUNT_CAT [0.01,10) FY16 1 1
#> 2 AMOUNT_CAT [100,250) FY17 2 200
#> 3 AMOUNT_CAT [25,50) FY14 1 25
#> 4 AMOUNT_CAT [25,50) FY18 2 55
#> 5 AMOUNT_CAT [250,500) FY17 1 300
#> 6 AMOUNT_CAT [50,75) FY17 1 50
#> 7 AMOUNT_CAT [50,75) FY18 1 50
#> 8 AMOUNT_CAT [500,1e+03) FY18 1 500
#> 9 INBOUND_CHANNEL Canvassing FY16 1 1
#> 10 INBOUND_CHANNEL Mail FY14 1 25
#> # ... with 17 more rows
答案 2 :(得分:0)
添加purrr
方法,使您的代码更加简洁,smarter
。
library(tidyverse)
library(rlang)
cat.list <- c("REPORT_CODE","PAYMENT_METHOD","INBOUND_CHANNEL","AMOUNT_CAT")
map_df(cat.list,
function(report_cat) {
sample_data %>%
group_by(!!sym(report_cat), YEAR) %>%
summarize(num=n(),total=sum(AMOUNT)) %>%
rename(REPORT_VALUE = !!sym(report_cat)) %>%
mutate(REPORT_CATEGORY = report_cat)
}
)
Hadley描述here(大约一半):
map_df(x, f)
实际上与do.call("rbind", lapply(x, f))
相同,但引导效率更高。
完全披露,感谢@useR向我展示如何使用sym(!!()
方法。我使用Programming in Dplyr
小插图支持自己进入一个角落,构建我认为最新的功能化dplyr
方法。
我使用dplyr
然后var <- enquo(var)
得到了主要的!!var
函数,但我找不到办法来处理cat.list
中的引用名称map_df
或lapply
。
谢谢你,useR教我一个更好的方法来编码tidyverse
编辑:谢谢G.Grothendieck解锁如何获得dplyr函数顺利接受的字符串列表:here
这使我能够完成我之前开发的替代定义方法:
report <- function(report_cat){
report_cat <- enquo(report_cat)
sample_data %>%
group_by(!!report_cat, YEAR) %>%
summarize(num=n(),total=sum(AMOUNT)) %>%
rename(REPORT_VALUE = !!report_cat) %>%
mutate(REPORT_CATEGORY := as.character(quote(!!report_cat))[2])
}
report_named <- function(x) {do.call("report", list(as.name(x)))}
map_df(cat.list, report_named)
> map_df(cat.list, report_named) # A tibble: 27 x 5 # Groups: REPORT_VALUE [16] REPORT_VALUE YEAR num total REPORT_CATEGORY <chr> <chr> <int> <int> <chr> 1 J FY14 1 25 REPORT_CODE 2 Q FY16 1 1 REPORT_CODE 3 Q FY17 1 100 REPORT_CODE 4 R FY17 1 50 REPORT_CODE 5 R FY18 2 75 REPORT_CODE 6 S FY17 2 400 REPORT_CODE 7 S FY18 2 530 REPORT_CODE 8 Check FY14 1 25 PAYMENT_METHOD 9 Check FY17 1 50 PAYMENT_METHOD 10 Check FY18 2 55 PAYMENT_METHOD # ... with 17 more rows
NB:yutannihilation的整洁解决方案真的是最佳解决方案恕我直言 - 我只是利用这个机会扩展我对如何进行拆分,应用,组合<的理解/ em>包含dplyr
函数的方法。