根据列创建索引

时间:2017-01-25 04:12:40

标签: r merge grouping data-manipulation

我有两个数据集:

# 1.
user_id  users    frequency
1        1        3
2        1        1
3        1        1

# 2.
user_id  sum      unique
1        2        1
2        0        0
3        1        1

我想在user_id上合并但是以序数方式基于unit1的索引,所以输出看起来像是,将user_id从图片中删除:

# 3.
frequency users sum    unique
3         1     2      1
1         2     1      1

有关如何实现这一目标的任何想法?另外,在学习如何进行这些类型的操作方面,它们是否是这种操作的名称?

2 个答案:

答案 0 :(得分:2)

library(data.table)
setDT(df)         # this step was to make it a data.table, if its a data.frame
setDT(df1)

# logic is : first merge both df's, then group by "frequency" columns
df[df1][, lapply(.SD, sum), by = .(frequency), .SDcols = c("sum", "unique", "users")]
#   frequency sum unique users
#1:         3   2      1     1
#2:         1   1      1     2

答案 1 :(得分:1)

以下是使用tidyverse的选项。我们可以在两个数据集之间进行inner_join,按频率'分组,我们得到sum

中变量的summarise_each
library(dplyr)
inner_join(df1, df2) %>%
       group_by(frequency) %>% 
       summarise_each(funs(sum), sum, unique, users)
#    frequency   sum unique users
#      <int> <int>  <int> <int>
#1         1     1      1     2
#2         3     2      1     1

或者使用base R,我们merge数据集并执行aggregate

aggregate(.~frequency, merge(df1, df2)[-1], FUN = sum)
#    frequency users sum unique
#1         1     2   1      1
#2         3     1   2      1