我正试图在每个州的资产变量中获得前三名观察。使用一个较小的例子,我的起始数据框如下:
#starting data frame
state = c("TX","TX","TX","TX","TX","TX","PA","PA","PA","PA","PA","PA","DE","DE","DE","DE","DE","DE")
assets = c(600,100,200,400,200,300,600,100,200,400,200,300,600,100,200,400,200,300)
stating_df = data.frame(state,assets)
我想谈谈这个问题:
#ending data frame
state = c("TX","TX","TX","PA","PA","PA","DE","DE","DE")
assets = c(600,500,400,600,500,400,600,500,400)
ending_df = data.frame(state,assets)
我可以轻松使用sort
命令,但这并不考虑每个州或前三个。我在数据框中有50个状态和超过100万个观测值。谢谢。
答案 0 :(得分:2)
您可以使用aggregate
aggregate(stating_df$assets, list(stating_df$state),
function(x) tail(sort(x), 3))
Group.1 x.1 x.2 x.3
1 DE 300 400 600
2 PA 300 400 600
3 TX 300 400 600
答案 1 :(得分:0)
以下作品
library(dplyr)
library(reshape2)
state = c("TX","TX","TX","TX","TX","TX","PA","PA","PA","PA","PA","PA","DE","DE","DE","DE","DE","DE")
assets = c(600,100,200,400,200,300,600,100,200,400,200,300,600,100,200,400,200,1300)
starting_df = data.frame(state,assets)
ending_df = starting_df %>% group_by(state) %>%
summarize(top3 = list(sort(assets, decreasing=T)[1:3]))
# I don't know your use case. But I recommend stopping in the above step
z = matrix(ending_df$top3 %>% unlist, ncol = 3, byrow = T)
ending_df = cbind(ending_df, z) %>% select(-top3)
ending_df = melt(ending_df, id.vars = "state")
ending_df = ending_df %>% arrange(state)