列“rate”必须是长度1(汇总值),而不是22906

时间:2018-06-10 08:21:01

标签: r

我遇到以下代码的麻烦。它返回

  

“summarise_impl(.data,dots)中的错误:列速率必须是长度1(汇总值),而不是22906”

我的代码有问题吗?

sub_grade是字符类型,int_rate是数字

results <- loan_data %>%
  select(credit_grade, sub_grade, int_rate, loan_amnt) %>%
  group_by(sub_grade) %>%
  summarise(
    rate = substr(int_rate * 100, 1, 4),
    nr_loans = n(),
    "&",
    percent1 = substr((nr_loans / a) * 100, 1, 5),
    klj = "&",
    Amount = sum(loan_amnt, na.rm = TRUE),
    klj1 = "&",
    percent2 = substr((Amount / total) * 100, 1, 5)
  )

只有在我添加第一个变量rate时才会出现问题。

可重复的例子:

sub_grade <- c("A1", "A2", "A3","A1","A3")
int_rate <– c(0.023, 0.027, 0.033,0.023,0.033)

我想要的是

sub_grade.  int_rate
  1. A1。 0.023
  2. A2。 0.027
  3. A3。 0.033
  4. 由于

1 个答案:

答案 0 :(得分:1)

问题是dplyr::summarise期望/接受每组一个值。但是代码中的substr(int_rate*100, ...)将返回每行的值,即每组的值很多。您需要考虑使用min, max, first, last etc之类的分组函数作为substr的一部分。考虑到OP发布的样本数据,解决方案可以是:

# Data
sub_grade <- c("A1", "A2", "A3","A1","A3")
int_rate <- c(0.023, 0.027,0.033,0.023,0.033)

loan_data <- data.frame(sub_grade, int_rate, stringsAsFactors = FALSE)

# Use dplyr to summarise on sub_grade
library(dplyr)
loan_data %>% group_by(sub_grade) %>%
  summarise(int_rate = first(int_rate)) %>%
  as.data.frame()

#   sub_grade int_rate
# 1        A1    0.023
# 2        A2    0.027
# 3        A3    0.033