我想要一个数据。我的数据A看起来像
author_id paper_id prob
731 24943 1
731 24943 1
731 688974 1
731 964345 .8
731 1201905 .9
731 1267992 1
736 249 .2
736 6889 1
736 94345 .7
736 1201905 .9
736 126992 .8
我希望的输出是:
author_id paper_id
731 24943,24943,688974,1201905,964345
736 6889,1201945,126992,94345,249
即paper_id按概率递减顺序排列。
如果我使用sql和R的组合,我认为解决方案将是
statement<-"select * from A
GROUP BY author_id
ORDER BY prob"
然后在R中使用粘贴一次为paper_id设置顺序。
但是我需要R.的整体解决方案。这可以做到吗?
谢谢
答案 0 :(得分:10)
如果temp
是您的数据集,请执行
library(data.table)
setDT(temp)[order(-prob), list(paper_id = paste0(paper_id, collapse=", ")), by = author_id]
## author_id paper_id
## 1: 731 24943, 24943, 688974, 1267992, 1201905, 964345
## 2: 736 6889, 1201905, 126992, 94345, 249
编辑:2014年8月11日
由于data.table
v&gt; = 1.9.4,您可以使用非常高效的setorder
代替order
str(temp)
setorder(setDT(temp), -prob)[, list(paper_id = paste0(paper_id, collapse=", ")), by = author_id]
## author_id paper_id
## 1: 731 24943, 24943, 688974, 1267992, 1201905, 964345
## 2: 736 6889, 1201905, 126992, 94345, 249
作为旁注,整个事情也可以用基数R轻松完成(虽然不建议用于大数据集)
aggregate(paper_id ~ author_id, temp[order(-temp$prob), ], paste, collapse = ", ")
# author_id paper_id
# 1 731 24943, 24943, 688974, 1267992, 1201905, 964345
# 2 736 6889, 1201905, 126992, 94345, 249
答案 1 :(得分:6)
要完成设置,这是一个dplyr答案:
df <- read.table(header = T, text =
"author_id paper_id prob
731 24943 1
731 24943 1
731 688974 1
731 964345 .8
731 1201905 .9
731 1267992 1
736 249 .2
736 6889 1
736 94345 .7
736 1201905 .9
736 126992 .8") # your dataset
library(dplyr)
df %>%
group_by(author_id) %>%
arrange(desc(prob)) %>%
summarise(paper_id = paste(paper_id, collapse = ", "))
## Source: local data frame [2 x 2]
##
## author_id paper_id
## 1 731 24943, 24943, 688974, 1267992, 1201905, 964345
## 2 736 6889, 1201905, 126992, 94345, 249
答案 2 :(得分:3)
你可以试试这个
library('plyr')
subdf <- ddply(sample.df,.(author_id), function(df){
ord <- order(df$prob,decreasing=T)
return(data.frame(paper_id=paste(df$paper_id[ord],collapse=',')))
})
subdf
author_id paper_id
1 731 24943,24943,688974,1267992,1201905,964345
2 736 6889,1201905,126992,94345,249