我正在尝试修改我的数据框:
start end duration_time
1 1 2 2.438
2 2 1 3.901
3 1 2 18.037
4 2 3 85.861
5 3 4 83.922
并创建如下内容:
start end duration_time weight
1 1 2 20.475 2
2 2 1 3.901 1
4 2 3 85.861 1
5 3 4 83.922 1
因此,将删除重复的起始端组合,权重将提高,持续时间将总和
我已经有一部分工作了,我无法减轻工作量:
library('plyr')
df <- read.table(header = TRUE, text = "start end duration_time
1 1 2 2.438
2 2 1 3.901
3 1 2 18.037
4 2 3 85.861
5 3 4 83.922")
ddply(df, c("start","end"), summarise, weight=? ,duration_time=sum(duration_time))
答案 0 :(得分:1)
base R
选项为aggregate
do.call(data.frame, aggregate(duration_time~., df1,
FUN = function(x) c(duration_time=sum(x), weight = length(x))))
答案 1 :(得分:0)
使用data.table的最简单的解决方案:
library(data.table)
setDT(df)[, .(duration_time=sum(duration_time), wt = .N) , by =c("start", "end")]
start end duration_time wt
1: 1 2 20.475 2
2: 2 1 3.901 1
3: 2 3 85.861 1
4: 3 4 83.922 1
使用dplyr,tidyr
尝试一些事情library(dplyr)
library(tidyr)
df1 <- df %>% unite(by_var, start,end)
df2 <- cbind(df1 %>% count(by_var), df1 %>% group_by(by_var)%>%
summarise( duration_time=sum(duration_time))%>%
separate(by_var, c("start","end")))[c(3,4,5,2)]
> df2
start end duration_time n
1 1 2 20.475 2
2 2 1 3.901 1
3 2 3 85.861 1
4 3 4 83.922 1