Question

这是我的数据框

Colour = c("red",   "blue", "red",  "blue", "yellow",   "green",    "red",  "blue", "green",    "red",  "yellow",   "blue")
Volume  = c(46,46,57,57,57,57,99,99,99,111,111,122)
Cases   = c(7,2,4,2,3,5,1,2,3,2,4,1)
df = data.frame(Colour, Volume, Cases)

如果Color为"red"或"blue"，但是如果Volume相同，我想总结一下案例。应保留未指定的那些颜色。如果红色和蓝色不能相加因为它们在Volume中不同，所以它们也应该保留

reult应该是这样的：

Colour = c("red_or_blue","red_or_blue","yellow","green","red_or_blue","green","red","yellow","blue")
Volume  = c(46,57,57,57,99,99,111,111,122)
Cases   = c(9,6,3,5,3,3,2,4,1)
df_agg = data.frame(Colour, Volume, Cases)

我找到了一种方法，我创建了另一个列，为行分配红色或蓝色"red_or_blue"，其余行分配x。然后我用了聚合：

df$test = ifelse(df$Colour %in% c("red", "blue"),"red_or_blue","x")
df_agg = aggregate(df$Cases, list(df$Volume, df$test), sum)

它有效，但我发现这有点麻烦。是否有更方便的方法可以跳过创建额外的列？将来我需要总结红色/蓝色和第57/99卷的案例。额外的列似乎使它变得有点棘手。

另外，如果它不是红色也不是蓝色，我没有设法将原始颜色接管。我试过这种方式，但它不起作用：

df$test = ifelse(df$Colour %in% c("red", "blue"),"red_or_blue",df$Colour)

干杯，保罗

Answer 1

这是一种坚持基础R的方式（但可能不是最有效的方式......）

按Volume
将数据拆分为多个组
```
temp = split(df, df$Volume)
```

创建一个快速功能，仅在存在“红色” AND “蓝色”的组中更改“红色”和“蓝色”的值“现在。

red.and.blue = function(x) {
  if (sum(c("red", "blue") %in% x$Colour) > 1) {
    x$Colour = gsub("red|blue", "red-and-blue", x$Colour)
  } else {
    x$Colour = as.character(x$Colour)
  }
  x
}

在您在步骤1中创建的temp对象上使用该功能：
```
temp = lapply(temp, red.and.blue)
```

使用aggregate()执行您需要执行的聚合。在aggregate()参数中指定名称，以便保留原始列名。

temp = lapply(temp, function(x) aggregate(list(Cases = x$Cases), 
                                          list(Colour = x$Colour, 
                                               Volume = x$Volume), sum))

将其全部放回data.frame()。如果要按原样存储，请不要忘记指定名称。

do.call(rbind, temp)
#             Colour Volume Cases
# 46    red-and-blue     46     9
# 57.1         green     57     5
# 57.2  red-and-blue     57     6
# 57.3        yellow     57     3
# 99.1         green     99     3
# 99.2  red-and-blue     99     3
# 111.1          red    111     2
# 111.2       yellow    111     4
# 122           blue    122     1

根据特定值汇总行

1 个答案: