我有两个数据集:
# 1.
user_id users frequency
1 1 3
2 1 1
3 1 1
# 2.
user_id sum unique
1 2 1
2 0 0
3 1 1
我想在user_id
上合并但是以序数方式基于unit1的索引,所以输出看起来像是,将user_id
从图片中删除:
# 3.
frequency users sum unique
3 1 2 1
1 2 1 1
有关如何实现这一目标的任何想法?另外,在学习如何进行这些类型的操作方面,它们是否是这种操作的名称?
答案 0 :(得分:2)
library(data.table)
setDT(df) # this step was to make it a data.table, if its a data.frame
setDT(df1)
# logic is : first merge both df's, then group by "frequency" columns
df[df1][, lapply(.SD, sum), by = .(frequency), .SDcols = c("sum", "unique", "users")]
# frequency sum unique users
#1: 3 2 1 1
#2: 1 1 1 2
答案 1 :(得分:1)
以下是使用tidyverse
的选项。我们可以在两个数据集之间进行inner_join
,按频率'分组,我们得到sum
summarise_each
library(dplyr)
inner_join(df1, df2) %>%
group_by(frequency) %>%
summarise_each(funs(sum), sum, unique, users)
# frequency sum unique users
# <int> <int> <int> <int>
#1 1 1 1 2
#2 3 2 1 1
或者使用base R
,我们merge
数据集并执行aggregate
aggregate(.~frequency, merge(df1, df2)[-1], FUN = sum)
# frequency users sum unique
#1 1 2 1 1
#2 3 1 2 1