Question

我有一个数据框df

df <- data.frame(id =c(1,2,1,4,1,5,6),
                    label=c("a","b", "a", "a","a", "e", "a"), 
                    color = c("g","a","g","g","a","a","a"),
                    threshold = c(12, 10, 12, 12, 12, 35, 40),
                    value =c(32.1,0,15.0,10,1,50,45),stringsAsFactors = F
            )

enter image description here

阈值基于标签

我应该像下面这样通过考虑每个id来获得一个表格，其中每个标签的值超出其阈值多少倍

颜色在计算超出值时是独立考虑的

enter image description here

我尝试过

final_df <- df %>% 
  mutate(check = if_else(value > threshold, 1, 0)) %>% 
  group_by(id, label) %>% 
  summarise(exceed = sum(check))

但是我没有获得各自的ID，而是获得了总数超过

enter image description here

Answer 1

仅对于基数R，请使用aggregate。

aggregate(seq.int(nrow(df)) ~ id + label, df, function(i) sum(df[i, 4] < df[i, 5]))
#  id label seq.int(nrow(df))
#1  1     a                 2
#2  4     a                 0
#3  6     a                 1
#4  2     b                 0
#5  5     e                 1

为了与问题中发布的预期输出匹配，需要做一些额外的工作。

exceed <- seq.int(nrow(df))
agg <- aggregate(exceed ~ id + label, df, function(i) sum(df[i, 4] < df[i, 5]))
res <- merge(df[1:3], agg)
unique(res)
#  id label color exceed
#1  1     a     g      2
#3  1     a     a      2
#4  2     b     a      0
#5  4     a     g      0
#6  5     e     a      1
#7  6     a     a      1

Answer 2

通过对代码进行少量修改：

final_df <- df %>% 
  group_by(id, label) %>% 
  mutate(check = if_else(value > threshold, 1, 0)) %>% 
  summarise(exceed = sum(check)) %>% 
  group_by(id, label)

为了更接近预期的输出，

final_df <- df %>% 
  group_by(id, label) %>% 
  mutate(exceed = sum(if_else(value > threshold, 1, 0))) %>% 
  group_by(id, label, color) %>% 
  filter(., row_number() == 1)

Answer 3

library(dplyr)
df %>% 
  group_by(id, label) %>% 
  mutate(exceed = sum(value > threshold)) %>%
  slice(1)

     id label color threshold value exceed
  <dbl> <chr> <chr>     <dbl> <dbl>  <int>
1     1 a     g            12  32.1      2
2     2 b     a            10   0        0
3     4 a     g            12  10        0
4     5 e     a            35  50        1
5     6 a     a            40  45        1

如果您希望输出为ID，标签和颜色的每种组合包含一个单独的行，只需在group_by函数之前添加一个新的slice：

df %>% 
  group_by(id, label) %>% 
  mutate(exceed = sum(value > threshold)) %>% 
  group_by(id, label, color) %>% 
  slice(1)

     id label color threshold value exceed
  <dbl> <chr> <chr>     <dbl> <dbl>  <int>
1     1 a     a            12   1        2
2     1 a     g            12  32.1      2
3     2 b     a            10   0        0
4     4 a     g            12  10        0
5     5 e     a            35  50        1
6     6 a     a            40  45        1

Answer 4

您的代码有少许变化

final_df <- df %>% mutate(check = if_else(value > threshold, 1, 0)) %>% group_by(id, label) %>% filter(check==1)
unique(final_df$id)

Answer 5

我们可以使用table和merge：

table_ <- table(subset(df,value>threshold, c("id","label")))
df2 <- merge(unique(df[c("id","label","color")]),table_,all.x=TRUE)
df2$Freq[is.na(df2$Freq)] <- 0

#   id label color Freq
# 1  1     a     g    2
# 2  1     a     a    2
# 3  2     b     a    0
# 4  4     a     g    0
# 5  5     e     a    1
# 6  6     a     a    1

超过特定值的检查表和计数次数超过相应阈值的相应ID和标签

5 个答案: