用分组数据中的用户定义函数替换值

时间:2017-11-06 22:56:06

标签: r if-statement dplyr

我遇到了一个问题,即如果满足条件,则将值替换为另一个值。我使用自己的函数data_manip,我可以在需要时分配或添加任何其他条件。

但是,当我尝试使用此data_manip函数时,它会使用指定的值更改该组中的所有值。但该特定群体中的其他值不符合这一条件。

这是我尝试过的,

df <- data.frame(percent = c(0.6, 0.7,1, 0.5,0.5,1,0.4,0.6,1), 
                 type = rep(c("good", "bad","ugly"),each=3), smoke=rep(c('Visky','Wine','Wine'),3),
                 sex=rep(c('male','male','female'),3))

> df
  percent type smoke    sex
1     0.6 good Visky   male
2     0.7 good  Wine   male
3     1.0 good  Wine female
4     0.5  bad Visky   male
5     0.5  bad  Wine   male
6     1.0  bad  Wine female
7     0.4 ugly Visky   male
8     0.6 ugly  Wine   male
9     1.0 ugly  Wine female


data_manip <- function(x,gr){
  if(grepl('goo|ug',gr)&&x<1){
    x[x==0.6] <- 1
  }
    else
  x
}

df%>%
  group_by(type)%>%
  mutate(percent_new=data_manip(percent,type))

给出

# A tibble: 9 x 5
# Groups:   type [3]
  percent   type  smoke    sex percent_new
    <dbl> <fctr> <fctr> <fctr>       <dbl>
1     0.6   good  Visky   male         1.0
2     0.7   good   Wine   male         1.0
3     1.0   good   Wine female         1.0
4     0.5    bad  Visky   male         0.5
5     0.5    bad   Wine   male         0.5
6     1.0    bad   Wine female         1.0
7     0.4   ugly  Visky   male         1.0
8     0.6   ugly   Wine   male         1.0
9     1.0   ugly   Wine female         1.0

如果条件不适合他们,我想保留原始percent值。

预期输出

 # A tibble: 9 x 5
    # Groups:   type [3]
      percent   type  smoke    sex percent_new
        <dbl> <fctr> <fctr> <fctr>       <dbl>
    1     0.6   good  Visky   male         1.0
    2     0.7   good   Wine   male         0.7
    3     1.0   good   Wine female         1.0
    4     0.5    bad  Visky   male         0.5
    5     0.5    bad   Wine   male         0.5
    6     1.0    bad   Wine female         1.0
    7     0.4   ugly  Visky   male         0.4
    8     0.6   ugly   Wine   male         1.0
    9     1.0   ugly   Wine female         1.0

2 个答案:

答案 0 :(得分:2)

您当前的data_manip函数似乎没有矢量化,因为它使用if (cond) { ... } else { ... },它通常只检查单个值,并且可能默认为向量的第一个元素。函数的矢量化版本如下所示:

data_manip <- function(x,gr){
    ifelse(grepl('goo|ug', gr) & x == 0.6, 1, x)
}

并给出了预期的结果:

> df%>%
+     group_by(type)%>%
+     mutate(percent_new=data_manip(percent,type))
# A tibble: 9 x 5
# Groups:   type [3]
  percent   type  smoke    sex percent_new
    <dbl> <fctr> <fctr> <fctr>       <dbl>
1     0.6   good  Visky   male         1.0
2     0.7   good   Wine   male         0.7
3     1.0   good   Wine female         1.0
4     0.5    bad  Visky   male         0.5
5     0.5    bad   Wine   male         0.5
6     1.0    bad   Wine female         1.0
7     0.4   ugly  Visky   male         0.4
8     0.6   ugly   Wine   male         1.0
9     1.0   ugly   Wine female         1.0

使用ifelse进行矢量化条件检查。

答案 1 :(得分:2)

这似乎是case_when对其有用的问题。

试试这个:

library(tidyverse)

df %>% 
  mutate(new_percentage = case_when(type == "good" & percent == 0.6 ~ 1,
                                    type == "ugly" & percent == 0.6 ~ 1,
                                    TRUE ~ as.double(.$percent)))

给出了:

# A tibble: 9 x 5
  percent   type  smoke    sex new_percentage
    <dbl> <fctr> <fctr> <fctr>          <dbl>
1     0.6   good  Visky   male            1.0
2     0.7   good   Wine   male            0.7
3     1.0   good   Wine female            1.0
4     0.5    bad  Visky   male            0.5
5     0.5    bad   Wine   male            0.5
6     1.0    bad   Wine female            1.0
7     0.4   ugly  Visky   male            0.4
8     0.6   ugly   Wine   male            1.0
9     1.0   ugly   Wine female            1.0