有条件地在dplyr中选择列值,然后更改数据类型

时间:2016-08-31 06:10:52

标签: r dplyr

我在R

中有以下数据框
   Lead.Stage       Number.of.Followup.Calls
1 Not Interested             Select
2  Unreachable                  ""
3   Qualified                   1
4  Unreachable                  2
5   Qualified                   2
6   Junk Lead                Select       

Number.of.Followup.Calls是字符类型。我想在Lead.Stage上执行一个groupby来计算该Lead.Stage的后续调用的平均数量

在dplyr中,我将过滤掉Select and empty String,然后将数字转换为数字。我在r中使用以下代码,但它似乎不起作用。

train %>% 
  group_by(Lead.Stage)  %>%
  filter((Number.of.Followup.Calls == "" | Number.of.Followup.Calls ==  
  "Select")) %>% 
  mutate_each_(funs(as.numeric), Number.of.Followup.Calls)  %>% 
  summarise(Total = mean(Number.of.Followup.Calls)) 

提前致谢:)

1 个答案:

答案 0 :(得分:3)

使用%in%

执行此操作会更容易
train %>% 
    group_by(Lead.Stage)  %>%
    filter(!Number.of.Followup.Calls %in% c("", "Select")) %>%
    summarise(Total = mean(as.numeric(Number.of.Followup.Calls)))
#   Lead.Stage Total
#       <chr> <dbl>
#1   Qualified   1.5
#2 Unreachable   2.0

或者,我们不需要执行所有filter和其他内容,因为转换为as.numeric会自动将所有非数字元素更改为NA,然后只需mean(., na.rm = TRUE)

train %>% 
    group_by(Lead.Stage)  %>%
    summarise(Total = mean(as.numeric(Number.of.Followup.Calls), na.rm = TRUE)) %>%
    na.omit()
#    Lead.Stage Total
#        <chr> <dbl>
# 1   Qualified   1.5
#2 Unreachable   2.0
#Warning messages:
#1: In mean(as.numeric(c("", "2")), na.rm = TRUE) :
# NAs introduced by coercion

warning message只是提醒您将non-numeric元素转换为NA