我试图将row.names
设置为数据的第一列,但是日期中似乎有一些重复项。
Error in `.rowNamesDF<-`(x, value = value) :
duplicate 'row.names' are not allowed
In addition: Warning message:
non-unique values when setting 'row.names': ‘2019-01-27 16:50:00’, ‘2019-01-28 16:50:00’, ‘2019-01-29 16:50:00’, ‘2019-01-30 16:50:00’, ‘2019-01-31 16:50:00’, ‘2019-02-01 16:50:00’, ‘2019-02-02 16:50:00’, ‘2019-02-03 16:50:00’
我现在正在尝试汇总重复项。例如,在数据中,我有2个关于日期2019-01-28 16:50:00
的结果。如何获取两个重复的时间序列之间的平均值?
我尝试了以下方法,但运气不佳:
df[!duplicated(df),] %>%
group_by(date) %>%
summarise(new = n())
并且:
df$date[duplicated(df$date)]
data <- df %>%
group_by(date) %>%
mutate_each(funs(mean), date) %>%
distinct
数据:
structure(list(date = structure(c(1548676200, 1548676500, 1548677100,
1548679200, 1548679500, 1548680100, 1548680400, 1548680700, 1548681000,
1548684300, 1548684600, 1548684900, 1548685200, 1548685500, 1548685800,
1548686400, 1548686700, 1548687000, 1548687300, 1548687600, 1548687900,
1548688200, 1548688500, 1548688800, 1548689100, 1548689400, 1548689700,
1548690000, 1548690300, 1548690600, 1548690600, 1548690900, 1548691200,
1548691500, 1548691800, 1548692100, 1548692400, 1548692700, 1548693000,
1548693300, 1548693600, 1548693900, 1548694200, 1548694500, 1548694800,
1548695100, 1548695400, 1548695700, 1548696000, 1548696300, 1548696600,
1548696900, 1548697200, 1548697500, 1548697800, 1548698100, 1548698400,
1548698700, 1548699000, 1548699300, 1548699600, 1548700500, 1548700800,
1548703200, 1548703500, 1548703800, 1548704100, 1548704400, 1548704700,
1548705000, 1548708000, 1548709200, 1548712200, 1548712500, 1548712800,
1548734400, 1548734700, 1548738000, 1548738300, 1548738600, 1548738900,
1548739200, 1548739500, 1548739800, 1548740100, 1548740400, 1548740700,
1548741000, 1548741300, 1548741600, 1548741900, 1548742200, 1548742500,
1548742800, 1548743100, 1548743400, 1548743700, 1548744000, 1548744300,
1548744600, 1548744900, 1548745200, 1548745500, 1548745800, 1548746100,
1548746400, 1548746700, 1548747000, 1548747300, 1548747600, 1548747900,
1548748200, 1548748500, 1548748800, 1548749100, 1548749400, 1548749700,
1548750000, 1548750300, 1548750600, 1548751200, 1548751500, 1548751800,
1548752100, 1548752400, 1548757200, 1548757500, 1548762300, 1548762600,
1548762900, 1548763200, 1548763500, 1548764400, 1548807000, 1548807300,
1548807900, 1548808200, 1548808500, 1548808800, 1548809100, 1548809400,
1548809700, 1548810000, 1548810300, 1548810600, 1548810900, 1548811200,
1548811500, 1548812100, 1548812400, 1548812700, 1548813000, 1548813300,
1548813900, 1548814200, 1548814500, 1548814800, 1548815100, 1548815400,
1548984000, 1548984900, 1548985500, 1548985800, 1548986100, 1548986400,
1548986700, 1548987000, 1548987300, 1548987600, 1548987900, 1548988200,
1548988500, 1548988800, 1548989100, 1548989400, 1548989700, 1548990000,
1548990300, 1548990600, 1548990900, 1548991500, 1548991800, 1548992100,
1548992400, 1548995400, 1548995700, 1548996300, 1548996600, 1548996900,
1548998400, 1549001100, 1549006500, 1549006800, 1549007100, 1549007400,
1549007700, 1549008000, 1549008300, 1549008600, 1549009500), class = c("POSIXct",
"POSIXt"), tzone = ""), close = c(90.78, 90.78, 90.69, 90.63,
90.94, 90.68, 90.72, 90.65, 90.79, 90.8, 90.87, 90.79, 90.75,
90.75, 90.93, 90.91, 90.9, 90.85, 90.79, 90.51, 90.31, 90.01,
89.67, 89.09, 89.49, 89.13, 88.61, 89.42, 89.42, 90.24, 89.42,
90.34, 90.65, 90.59, 90.16, 89.98, 89.73, 89.83, 90, 89.8, 89.72,
89.62, 89.62, 89.91, 89.97, 90.11, 90.01, 90.08, 90.09, 90.1,
90.1, 90.1, 90.13, 90.36, 90.58, 90.17, 90.42, 90.7, 90.71, 90.56,
90.65, 90.88, 90.87, 90.74, 90.56, 90.51, 90.57, 90.64, 90.78,
90.94, 90.94, 90.8, 90.83, 90.83, 90.88, 90.95, 90.93, 90.86,
90.79, 90.65, 90.79, 90.34, 90.31, 90.2, 90.21, 89.79, 89.7,
89.97, 90.07, 89.82, 90.08, 89.96, 90.64, 90.6, 90.46, 90.41,
90.26, 90.25, 90.18, 90.25, 90.29, 90.07, 90.35, 90.36, 90.15,
89.94, 89.75, 89.98, 89.87, 89.92, 90.08, 89.91, 90, 90.42, 90.23,
90.25, 90.46, 90.94, 90.85, 90.91, 90.93, 90.77, 90.75, 90.79,
90.73, 90.91, 90.91, 90.91, 90.72, 90.75, 90.79, 90.95, 90.94,
90.65, 90.63, 90.55, 90.71, 90.76, 90.72, 90.78, 90.79, 90.78,
90.62, 90.71, 90.71, 90.72, 90.86, 90.9, 90.93, 90.94, 90.65,
90.74, 90.84, 90.8, 90.8, 90.89, 90.9, 90.9, 90.91, 90.9, 90.86,
90.52, 90.54, 90.7, 90.51, 90.69, 90.7, 90.75, 90.66, 90.67,
90.8, 90.83, 90.82, 90.91, 90.84, 90.71, 90.69, 90.84, 90.75,
90.8, 90.74, 90.87, 90.94, 90.94, 90.86, 90.71, 90.72, 90.72,
90.83, 90.93, 90.93, 90.87, 90.78, 90.63, 90.54, 90.54, 90.61,
90.79, 90.71, 90.84)), row.names = c(NA, -200L), class = "data.frame")
答案 0 :(得分:0)
有一种tidyverse
的方式,正如@Edwin所正确建议的那样:
df %>% group_by(date) %>% summarise(mclose = mean(close))
如果您的数据很大,也可以使用data.table
进行计算,以避免管道开销。
library(data.table)
dt <- as.data.table(df)
# Optional: set date as a key, to make grouping faster.
# setkey(dt, date)
dt[, mean(close), date]
# date V1
# 1: 2019-01-28 12:50:00 90.78
# 2: 2019-01-28 12:55:00 90.78
# 3: 2019-01-28 13:05:00 90.69
# 4: 2019-01-28 13:40:00 90.63
# 5: 2019-01-28 13:45:00 90.94
# ---
# 195: 2019-02-01 08:55:00 90.54
# 196: 2019-02-01 09:00:00 90.61
# 197: 2019-02-01 09:05:00 90.79
# 198: 2019-02-01 09:10:00 90.71
# 199: 2019-02-01 09:25:00 90.84
您可以轻松命名输出变量:
data <- dt[, .(mclose=mean(close)), date]
您可以使用POSIXt
和IDate
类型摆脱ITime
。它将使用整数,因此不会出现分组问题。
dt[, `:=`(date=as.IDate(date), time=as.ITime(date))]
data <- dt[, .(mclose=mean(close)), .(date, time)]