我想根据另一列中的总和提取唯一值。例如,我有以下数据框"music"
ID | Song | artist | revenue
7520 | Dance with me | R kelly | 2000
7531 | Gone girl | Vincent | 1890
8193 | Motivation | R Kelly | 3500
9800 | What | Beyonce | 12000
2010 | Excuse Me | Pharell | 1010
1999 | Remove me | Jack Will | 500
基本上,我想根据收入对排名前5位的艺术家进行排序,而不会对给定艺术家的重复条目进行排序
答案 0 :(得分:1)
您只需要order()
即可。例如:
head(unique(music$artist[order(music$revenue, decreasing=TRUE)]))
或者,保留所有专栏(虽然艺术家的独特性会有点棘手):
head(music[order(music$revenue, decreasing=TRUE),])
答案 1 :(得分:1)
以下是dplyr
方式:
df <- read.table(text = "
ID | Song | artist | revenue
7520 | Dance with me | R Kelly | 2000
7531 | Gone girl | Vincent | 1890
8193 | Motivation | R Kelly | 3500
9800 | What | Beyonce | 12000
2010 | Excuse Me | Pharell | 1010
1999 | Remove me | Jack Will | 500
", header = TRUE, sep = "|", strip.white = TRUE)
您可以group_by
艺术家,然后您可以选择要达到峰值的条目数(此处仅为3条):
require(dplyr)
df %>% group_by(artist) %>%
summarise(tot = sum(revenue)) %>%
arrange(desc(tot)) %>%
head(3)
结果:
Source: local data frame [3 x 2]
artist tot
1 Beyonce 12000
2 R Kelly 5500
3 Vincent 1890