我在R
中有以下数据框 Lead.Stage Number.of.Followup.Calls
1 Not Interested Select
2 Unreachable ""
3 Qualified 1
4 Unreachable 2
5 Qualified 2
6 Junk Lead Select
Number.of.Followup.Calls是字符类型。我想在Lead.Stage上执行一个groupby来计算该Lead.Stage的后续调用的平均数量
在dplyr中,我将过滤掉Select and empty String
,然后将数字转换为数字。我在r中使用以下代码,但它似乎不起作用。
train %>%
group_by(Lead.Stage) %>%
filter((Number.of.Followup.Calls == "" | Number.of.Followup.Calls ==
"Select")) %>%
mutate_each_(funs(as.numeric), Number.of.Followup.Calls) %>%
summarise(Total = mean(Number.of.Followup.Calls))
提前致谢:)
答案 0 :(得分:3)
使用%in%
train %>%
group_by(Lead.Stage) %>%
filter(!Number.of.Followup.Calls %in% c("", "Select")) %>%
summarise(Total = mean(as.numeric(Number.of.Followup.Calls)))
# Lead.Stage Total
# <chr> <dbl>
#1 Qualified 1.5
#2 Unreachable 2.0
或者,我们不需要执行所有filter
和其他内容,因为转换为as.numeric
会自动将所有非数字元素更改为NA
,然后只需mean(., na.rm = TRUE)
train %>%
group_by(Lead.Stage) %>%
summarise(Total = mean(as.numeric(Number.of.Followup.Calls), na.rm = TRUE)) %>%
na.omit()
# Lead.Stage Total
# <chr> <dbl>
# 1 Qualified 1.5
#2 Unreachable 2.0
#Warning messages:
#1: In mean(as.numeric(c("", "2")), na.rm = TRUE) :
# NAs introduced by coercion
warning message
只是提醒您将non-numeric
元素转换为NA
。