Question

我的数据集是：

unit      date      total
1     2019-04-02      7
1     2020-01-01      5
2     2019-12-01      10
2     2020-01-03      2
3     2019-09-01      3
3     2020-03-03      3

如果每个“单位”的“总计”中的任何值大于或等于10，我想添加“类别”列：

unit      date      total     category
1     2019-04-02      7          low
1     2020-01-01      5          low
2     2019-12-01      10         high
2     2020-01-03      2          high
3     2019-09-01      3          low
3     2020-03-03      3          low

我尝试了很多事情，例如：

df$category <- "low"
for (i in df$unit){
  if (rowSums(df$total >= 10) > 0){
    df$category <- "high"
  }
}

，但是没有一个起作用。你能请教吗？

Answer 1

尝试解决每个组中的最大值，然后分配类别。这里的代码：

library(dplyr)
#Code
dfnew <- df %>% group_by(unit) %>% mutate(category=ifelse(max(total,na.rm=T)>=10,'High','Low'))

输出：

# A tibble: 6 x 4
# Groups:   unit [3]
   unit date       total category
  <int> <chr>      <int> <chr>   
1     1 2019-04-02     7 Low     
2     1 2020-01-01     5 Low     
3     2 2019-12-01    10 High    
4     2 2020-01-03     2 High    
5     3 2019-09-01     3 Low     
6     3 2020-03-03     3 Low

使用了一些数据：

#Data
df <- structure(list(unit = c(1L, 1L, 2L, 2L, 3L, 3L), date = c("2019-04-02", 
"2020-01-01", "2019-12-01", "2020-01-03", "2019-09-01", "2020-03-03"
), total = c(7L, 5L, 10L, 2L, 3L, 3L)), class = "data.frame", row.names = c(NA, 
-6L))

Answer 2

这项工作：

> library(dplyr)
> df %>% group_by(unit) %>% mutate(category = case_when(max(total) >= 10 ~ 'high', TRUE ~ 'low'))
# A tibble: 6 x 4
# Groups:   unit [3]
   unit date                    total category
  <dbl> <dttm>                  <dbl> <chr>   
1     1 2019-04-02 00:00:00.000     7 low     
2     1 2020-01-01 00:00:00.000     5 low     
3     2 2019-12-01 00:00:00.000    10 high    
4     2 2020-01-03 00:00:00.000     2 high    
5     3 2019-09-01 00:00:00.000     3 low     
6     3 2020-03-03 00:00:00.000     3 low     
>

Answer 3

使用ave的一个基本R选项，例如

transform(
  df,
  category = c("Low","High")[ave(total>=10,unit,FUN = any)+1]
)

给出

  unit       date total category
1    1 2019-04-02     7      Low
2    1 2020-01-01     5      Low
3    2 2019-12-01    10     High
4    2 2020-01-03     2     High
5    3 2019-09-01     3      Low
6    3 2020-03-03     3      Low

数据

> dput(df)
structure(list(unit = c(1L, 1L, 2L, 2L, 3L, 3L), date = c("2019-04-02", 
"2020-01-01", "2019-12-01", "2020-01-03", "2019-09-01", "2020-03-03"
), total = c(7L, 5L, 10L, 2L, 3L, 3L)), class = "data.frame", row.names = c(NA, 
-6L))

Answer 4

对于每个unit，您可以检查any的值是否大于10，并相应地分配category的值。

library(dplyr)
df %>%
  group_by(unit) %>%
  mutate(category = if(any(total >= 10)) 'high' else 'low')

#   unit date       total category
#  <int> <chr>      <int> <chr>   
#1     1 2019-04-02     7 low     
#2     1 2020-01-01     5 low     
#3     2 2019-12-01    10 high    
#4     2 2020-01-03     2 high    
#5     3 2019-09-01     3 low     
#6     3 2020-03-03     3 low

可以在基本R中实现相同的逻辑

df$category <- with(df, ave(total, unit, FUN = function(x) 
                        if(any(x >= 10)) 'high' else 'low'))

和data.table：

library(data.table)
setDT(df)[, category := if(any(total >= 10)) 'high' else 'low', unit]

根据规则将列添加到df

4 个答案: