Question

我有一个包含以下列的主数据框：

location_id order_id created_at pos user_id spend_amount earn_amount ref name street_address city state time date month
109936 5536 32684814 2016-06-20 17:21:56 sw?etgreen 2243440 974 900 12 - 19th + L 19th + L 1901 L St NW Washington DC 17:21:56 2016-06-20 Jun

我已将其汇总为多个子数据框

   AmountByUser<-aggregate(total$spend_amount, by=list(Category=total$user_id), FUN=sum)
   colnames(AmountByUser) <- c("User_Id", "Total Amount Spent")

      User_Id    Total Amount Spent
99696  3435653 46450628
207341 4821392 39621941
177899 4308353 11401622
177907 4308520 11034094
177906 4308515 8536865
177905 4308497 8324570
236885 5407939 7090316
110781 3532013 6187870
118742 3612960 4498527
236889 5407986 3441924
105507 3469230 1603637

如何获取此子数据框的前n％的行索引，然后对主数据框进行子集化？最终目标，包含主数据框的所有原始列，以及仅占用最高消费者user_ids的行。

Answer 1

实际上你根本不需要行索引。只需获取聚合的前n个，并在完整数据框中使用%in%运算符。

topUser = AmountByUser$User_Id[1:20]
topAllData = allData[allData$user_id %in% topUser,]

在R中聚合后将ID作为列表获取

1 个答案: