根据组突变新列

时间:2020-11-10 02:21:12

标签: r dplyr grouping mutate

有没有一种方法可以根据共同的列值(id)将行分组在一起,然后根据每个组中是否包含新的ID(new.id)来对新列进行突变。值是否高于和/或低于1000?如:

  1. < 1000 = "low/low"(该组中的所有值均低于1000)
  2. < 1000 and > 1000 = "low/high"(其中一些在1000以下并在1000以上)
  3. > 1000 = "high/high"(所有值均大于1000)

数据

#Example
  id values
1   a    200
2   a    300
3   b    100
4   b   2000
5   b   3000
6   c   4000
7   c   2000
8   c   3000
9   d   2400
10  d   2000
11  d    400

#dataframe:
structure(list(id = c("a", "a", "b", "b", "b", "c", "c", "c", 
"d", "d", "d"), values = c(200, 300, 100, 2000, 3000, 4000, 2000, 
3000, 2400, 2000, 400)), class = "data.frame", row.names = c(NA, 
-11L))

所需的输出

   id values    new.id
1   a    200   low/low
2   a    300   low/low
3   b    100  low/high
4   b   2000  low/high
5   b   3000  low/high
6   c   4000 high/high
7   c   2000 high/high
8   c   3000 high/high
9   d   2400  low/high
10  d   2000  low/high
11  d    400  low/high

dplyr解决方案将是很好的选择,但对其他任何人都开放!

2 个答案:

答案 0 :(得分:0)

df['result']=pd.cut(df.start, [-np.inf, 0, 250,np.inf], labels=['unacceptablelow','acceptable', 'unacceptablehigh'])


    



        group  start  end        diff percent        date  \
A 2019-04-01  2019-05-01   -160  -11  04-01-2019      to  05-01-2019   
  2019-05-01  2019-06-01    136    8  05-01-2019      to  06-01-2019   
B 2020-06-01  2020-07-01    202    5  06-01-2020      to  07-01-2020   
  2020-07-01  2020-08-01    283    7  07-01-2020      to  08-01-2020   

                        result  
A 2019-04-01   unacceptablelow  
  2019-05-01        acceptable  
B 2020-06-01        acceptable  
  2020-07-01  unacceptablehigh 

答案 1 :(得分:0)

或者,您可以使用 dplyr 中的 recode 功能。


df %>% group_by(id) %>%
  mutate(
    new.id = dplyr::recode(
      sum(values > 1000) / length(values),
      `0` = "low/low",
      `1` = "high/high",
      .default = "low/high"
    )
  )

如果您还希望保留总数,则


df %>% group_by(id) %>%
  add_tally() %>%
  mutate(new.id = dplyr::recode(
    sum(values > 1000) / n,
    `0` = "low/low",
    `1` = "high/high",
    .default = "low/high"
  ))