我的数据集是:
unit date total
1 2019-04-02 7
1 2020-01-01 5
2 2019-12-01 10
2 2020-01-03 2
3 2019-09-01 3
3 2020-03-03 3
如果每个“单位”的“总计”中的任何值大于或等于10,我想添加“类别”列:
unit date total category
1 2019-04-02 7 low
1 2020-01-01 5 low
2 2019-12-01 10 high
2 2020-01-03 2 high
3 2019-09-01 3 low
3 2020-03-03 3 low
我尝试了很多事情,例如:
df$category <- "low"
for (i in df$unit){
if (rowSums(df$total >= 10) > 0){
df$category <- "high"
}
}
,但是没有一个起作用。你能请教吗?
答案 0 :(得分:1)
尝试解决每个组中的最大值,然后分配类别。这里的代码:
library(dplyr)
#Code
dfnew <- df %>% group_by(unit) %>% mutate(category=ifelse(max(total,na.rm=T)>=10,'High','Low'))
输出:
# A tibble: 6 x 4
# Groups: unit [3]
unit date total category
<int> <chr> <int> <chr>
1 1 2019-04-02 7 Low
2 1 2020-01-01 5 Low
3 2 2019-12-01 10 High
4 2 2020-01-03 2 High
5 3 2019-09-01 3 Low
6 3 2020-03-03 3 Low
使用了一些数据:
#Data
df <- structure(list(unit = c(1L, 1L, 2L, 2L, 3L, 3L), date = c("2019-04-02",
"2020-01-01", "2019-12-01", "2020-01-03", "2019-09-01", "2020-03-03"
), total = c(7L, 5L, 10L, 2L, 3L, 3L)), class = "data.frame", row.names = c(NA,
-6L))
答案 1 :(得分:1)
这项工作:
> library(dplyr)
> df %>% group_by(unit) %>% mutate(category = case_when(max(total) >= 10 ~ 'high', TRUE ~ 'low'))
# A tibble: 6 x 4
# Groups: unit [3]
unit date total category
<dbl> <dttm> <dbl> <chr>
1 1 2019-04-02 00:00:00.000 7 low
2 1 2020-01-01 00:00:00.000 5 low
3 2 2019-12-01 00:00:00.000 10 high
4 2 2020-01-03 00:00:00.000 2 high
5 3 2019-09-01 00:00:00.000 3 low
6 3 2020-03-03 00:00:00.000 3 low
>
答案 2 :(得分:1)
使用ave
的一个基本R选项,例如
transform(
df,
category = c("Low","High")[ave(total>=10,unit,FUN = any)+1]
)
给出
unit date total category
1 1 2019-04-02 7 Low
2 1 2020-01-01 5 Low
3 2 2019-12-01 10 High
4 2 2020-01-03 2 High
5 3 2019-09-01 3 Low
6 3 2020-03-03 3 Low
数据
> dput(df)
structure(list(unit = c(1L, 1L, 2L, 2L, 3L, 3L), date = c("2019-04-02",
"2020-01-01", "2019-12-01", "2020-01-03", "2019-09-01", "2020-03-03"
), total = c(7L, 5L, 10L, 2L, 3L, 3L)), class = "data.frame", row.names = c(NA,
-6L))
答案 3 :(得分:0)
对于每个unit
,您可以检查any
的值是否大于10,并相应地分配category
的值。
library(dplyr)
df %>%
group_by(unit) %>%
mutate(category = if(any(total >= 10)) 'high' else 'low')
# unit date total category
# <int> <chr> <int> <chr>
#1 1 2019-04-02 7 low
#2 1 2020-01-01 5 low
#3 2 2019-12-01 10 high
#4 2 2020-01-03 2 high
#5 3 2019-09-01 3 low
#6 3 2020-03-03 3 low
可以在基本R中实现相同的逻辑
df$category <- with(df, ave(total, unit, FUN = function(x)
if(any(x >= 10)) 'high' else 'low'))
和data.table
:
library(data.table)
setDT(df)[, category := if(any(total >= 10)) 'high' else 'low', unit]