我遇到以下代码的麻烦。它返回
“summarise_impl(.data,dots)中的错误:列速率必须是长度1(汇总值),而不是22906”
我的代码有问题吗?
sub_grade
是字符类型,int_rate
是数字
results <- loan_data %>%
select(credit_grade, sub_grade, int_rate, loan_amnt) %>%
group_by(sub_grade) %>%
summarise(
rate = substr(int_rate * 100, 1, 4),
nr_loans = n(),
"&",
percent1 = substr((nr_loans / a) * 100, 1, 5),
klj = "&",
Amount = sum(loan_amnt, na.rm = TRUE),
klj1 = "&",
percent2 = substr((Amount / total) * 100, 1, 5)
)
只有在我添加第一个变量rate
时才会出现问题。
可重复的例子:
sub_grade <- c("A1", "A2", "A3","A1","A3")
int_rate <– c(0.023, 0.027, 0.033,0.023,0.033)
我想要的是
sub_grade. int_rate
由于
答案 0 :(得分:1)
问题是dplyr::summarise
期望/接受每组一个值。但是代码中的substr(int_rate*100, ...)
将返回每行的值,即每组的值很多。您需要考虑使用min, max, first, last etc
之类的分组函数作为substr
的一部分。考虑到OP发布的样本数据,解决方案可以是:
# Data
sub_grade <- c("A1", "A2", "A3","A1","A3")
int_rate <- c(0.023, 0.027,0.033,0.023,0.033)
loan_data <- data.frame(sub_grade, int_rate, stringsAsFactors = FALSE)
# Use dplyr to summarise on sub_grade
library(dplyr)
loan_data %>% group_by(sub_grade) %>%
summarise(int_rate = first(int_rate)) %>%
as.data.frame()
# sub_grade int_rate
# 1 A1 0.023
# 2 A2 0.027
# 3 A3 0.033