Question

我想根据另一列中的总和提取唯一值。例如，我有以下数据框"music"

ID    | Song            |  artist       | revenue 
7520  | Dance with me   |   R kelly     |   2000    
7531  | Gone girl       |   Vincent     |   1890     
8193  | Motivation      |   R Kelly     |   3500     
9800  | What            |   Beyonce     |  12000    
2010  | Excuse Me       |   Pharell     |   1010     
1999  | Remove me       |   Jack Will   |    500

基本上，我想根据收入对排名前5位的艺术家进行排序，而不会对给定艺术家的重复条目进行排序

Answer 1

您只需要order()即可。例如：

head(unique(music$artist[order(music$revenue, decreasing=TRUE)]))

或者，保留所有专栏（虽然艺术家的独特性会有点棘手）：

head(music[order(music$revenue, decreasing=TRUE),])

Answer 2

以下是dplyr方式：

df <- read.table(text = "
ID    | Song            |  artist       | revenue 
7520  | Dance with me   |   R Kelly     |   2000    
7531  | Gone girl       |   Vincent     |   1890     
8193  | Motivation      |   R Kelly     |   3500     
9800  | What            |   Beyonce     |  12000    
2010  | Excuse Me       |   Pharell     |   1010     
1999  | Remove me       |   Jack Will   |    500      
", header = TRUE, sep = "|", strip.white = TRUE)

您可以group_by艺术家，然后您可以选择要达到峰值的条目数（此处仅为3条）：

require(dplyr)
df %>% group_by(artist) %>%
  summarise(tot = sum(revenue)) %>% 
  arrange(desc(tot)) %>%
  head(3)

结果：

Source: local data frame [3 x 2]

   artist   tot
1 Beyonce 12000
2 R Kelly  5500
3 Vincent  1890

如何基于R中的另一列对唯一值进行排序

2 个答案: