我想在R中创建以下candyData的子集,这样我应该按照品牌对数据进行分组,对于每个独特的品牌,我应该找到并打印A和B的最大值。为了说明新数据应该有品牌价值雀巢出现两次,相应的糖果价值A和B都出现一次对应雀巢及其在第三列中的最大值,同样适用于所有品牌价值。谢谢,请帮忙。
candyData <- read.table(
text = "
Brand Candy value
Nestle A 12
Nestle B 34
Nestle A 32
Hershey's A 55
Hershey's B 14
Hershey's B 19
Mars B 24
Nestle B 26
Nestle A 28
Hershey's B 23
Hershey's B 23
Hershey's A 65
Mars A 23
Mars B 34",
header = TRUE,
stringsAsFactors = FALSE)
答案 0 :(得分:2)
试试这个:
library(dplyr)
candyData %>%
group_by(Brand, Candy) %>%
summarise(max=max(value))
输出将是:
# A tibble: 6 x 3
# Groups: Brand [?]
Brand Candy max
<chr> <chr> <dbl>
1 Hershey's A 65.
2 Hershey's B 23.
3 Mars A 23.
4 Mars B 34.
5 Nestle A 32.
6 Nestle B 34.
答案 1 :(得分:2)
aggregate(value ~ ., candyData, max)
这个candyData
分组Brand
和Candy
(因为它们都是value
以外的所有列; .
执行此操作)并提供{{1每组的max
。
答案 2 :(得分:1)
再加上几个解决方案:
cd <- read.table(
text = "
Brand Candy value
Nestle A 12
Nestle B 34
Nestle A 32
Hershey's A 55
Hershey's B 14
Hershey's B 19
Mars B 24
Nestle B 26
Nestle A 28
Hershey's B 23
Hershey's B 23
Hershey's A 65
Mars A 23
Mars B 34",
header = TRUE,
stringsAsFactors = FALSE)
#using split + lapply or equivalently, by
c(by(cd$value, paste(cd$Brand, cd$Candy), max))
#using tapply i.e. apply to each group
tapply(cd$value, paste(cd$Brand, cd$Candy), max)
#using data.table
library(data.table)
setDT(cd)[, .(Max=max(value)), by=.(Brand, Candy)]
#using sqldf
library(sqldf)
sqldf("select Brand, Candy, max(value) as Max from cd group by Brand, Candy")
答案 3 :(得分:0)
虽然我的答案远不如使用dplyr
那样优雅,但我使用基础R创建了一个解决方案。
splittedData <- split(candyData,candyData$Brand)
resultDf <- data.frame(matrix(ncol = 3))
colnames(resultDf) <- c("Brand", "Candy", "maxValue")
insertIndex<-1
for(dfIndex in 1:length(splittedData)) {
tempDf <- splittedData[[dfIndex]]
tableDf <- data.frame(table(tempDf$Candy))
tableDf[,1] <- as.character(tableDf[,1])
for(i in 1:nrow(tableDf)) {
resultDf[insertIndex, 1] <- tempDf$Brand[1]
resultDf[insertIndex, 2] <- tableDf[i,1]
resultDf[insertIndex, 3] <- max(tempDf$value[tempDf$Candy==tableDf[i,1]])
insertIndex <- insertIndex + 1
}
}
输出是一个新的df:
Brand Candy maxValue
1 Hershey's A 65
2 Hershey's B 23
3 Mars A 23
4 Mars B 34
5 Nestle A 32
6 Nestle B 34
答案 4 :(得分:0)
使用提供的示例数据和data.table
:
library(data.table)
setDT(candyData)
candyData[,.(Max = max(value)), keyby = .(Brand,Candy)]
给出
Brand Candy Max
1: Hershey's A 65
2: Hershey's B 23
3: Mars A 23
4: Mars B 34
5: Nestle A 32
6: Nestle B 34