在每个州中排名最高至最低,前三名

时间:2017-06-21 00:39:07

标签: r

我正试图在每个州的资产变量中获得前三名观察。使用一个较小的例子,我的起始数据框如下:

#starting data frame
state = c("TX","TX","TX","TX","TX","TX","PA","PA","PA","PA","PA","PA","DE","DE","DE","DE","DE","DE") 
assets = c(600,100,200,400,200,300,600,100,200,400,200,300,600,100,200,400,200,300) 
stating_df = data.frame(state,assets)

我想谈谈这个问题:

#ending data frame
state = c("TX","TX","TX","PA","PA","PA","DE","DE","DE") 
assets = c(600,500,400,600,500,400,600,500,400) 
ending_df = data.frame(state,assets)

我可以轻松使用sort命令,但这并不考虑每个州或前三个。我在数据框中有50个状态和超过100万个观测值。谢谢。

2 个答案:

答案 0 :(得分:2)

您可以使用aggregate

aggregate(stating_df$assets, list(stating_df$state), 
    function(x) tail(sort(x), 3))
  Group.1 x.1 x.2 x.3
1      DE 300 400 600
2      PA 300 400 600
3      TX 300 400 600

答案 1 :(得分:0)

以下作品

library(dplyr)
library(reshape2)
state = c("TX","TX","TX","TX","TX","TX","PA","PA","PA","PA","PA","PA","DE","DE","DE","DE","DE","DE") 
assets = c(600,100,200,400,200,300,600,100,200,400,200,300,600,100,200,400,200,1300) 
starting_df = data.frame(state,assets)

ending_df = starting_df %>% group_by(state) %>%
  summarize(top3 = list(sort(assets, decreasing=T)[1:3]))

# I don't know your use case. But I recommend stopping in the above step

z = matrix(ending_df$top3 %>% unlist, ncol = 3, byrow = T)
ending_df = cbind(ending_df, z) %>% select(-top3)

ending_df = melt(ending_df, id.vars = "state")
ending_df = ending_df %>% arrange(state)