Question

我试图将row.names设置为数据的第一列，但是日期中似乎有一些重复项。

Error in `.rowNamesDF<-`(x, value = value) : 
  duplicate 'row.names' are not allowed
In addition: Warning message:
non-unique values when setting 'row.names': ‘2019-01-27 16:50:00’, ‘2019-01-28 16:50:00’, ‘2019-01-29 16:50:00’, ‘2019-01-30 16:50:00’, ‘2019-01-31 16:50:00’, ‘2019-02-01 16:50:00’, ‘2019-02-02 16:50:00’, ‘2019-02-03 16:50:00’

我现在正在尝试汇总重复项。例如，在数据中，我有2个关于日期2019-01-28 16:50:00的结果。如何获取两个重复的时间序列之间的平均值？

我尝试了以下方法，但运气不佳：

df[!duplicated(df),] %>% 
  group_by(date) %>% 
  summarise(new = n())

并且：

df$date[duplicated(df$date)]


data <- df %>% 
  group_by(date) %>%
  mutate_each(funs(mean), date) %>% 
  distinct

数据：

structure(list(date = structure(c(1548676200, 1548676500, 1548677100, 
1548679200, 1548679500, 1548680100, 1548680400, 1548680700, 1548681000, 
1548684300, 1548684600, 1548684900, 1548685200, 1548685500, 1548685800, 
1548686400, 1548686700, 1548687000, 1548687300, 1548687600, 1548687900, 
1548688200, 1548688500, 1548688800, 1548689100, 1548689400, 1548689700, 
1548690000, 1548690300, 1548690600, 1548690600, 1548690900, 1548691200, 
1548691500, 1548691800, 1548692100, 1548692400, 1548692700, 1548693000, 
1548693300, 1548693600, 1548693900, 1548694200, 1548694500, 1548694800, 
1548695100, 1548695400, 1548695700, 1548696000, 1548696300, 1548696600, 
1548696900, 1548697200, 1548697500, 1548697800, 1548698100, 1548698400, 
1548698700, 1548699000, 1548699300, 1548699600, 1548700500, 1548700800, 
1548703200, 1548703500, 1548703800, 1548704100, 1548704400, 1548704700, 
1548705000, 1548708000, 1548709200, 1548712200, 1548712500, 1548712800, 
1548734400, 1548734700, 1548738000, 1548738300, 1548738600, 1548738900, 
1548739200, 1548739500, 1548739800, 1548740100, 1548740400, 1548740700, 
1548741000, 1548741300, 1548741600, 1548741900, 1548742200, 1548742500, 
1548742800, 1548743100, 1548743400, 1548743700, 1548744000, 1548744300, 
1548744600, 1548744900, 1548745200, 1548745500, 1548745800, 1548746100, 
1548746400, 1548746700, 1548747000, 1548747300, 1548747600, 1548747900, 
1548748200, 1548748500, 1548748800, 1548749100, 1548749400, 1548749700, 
1548750000, 1548750300, 1548750600, 1548751200, 1548751500, 1548751800, 
1548752100, 1548752400, 1548757200, 1548757500, 1548762300, 1548762600, 
1548762900, 1548763200, 1548763500, 1548764400, 1548807000, 1548807300, 
1548807900, 1548808200, 1548808500, 1548808800, 1548809100, 1548809400, 
1548809700, 1548810000, 1548810300, 1548810600, 1548810900, 1548811200, 
1548811500, 1548812100, 1548812400, 1548812700, 1548813000, 1548813300, 
1548813900, 1548814200, 1548814500, 1548814800, 1548815100, 1548815400, 
1548984000, 1548984900, 1548985500, 1548985800, 1548986100, 1548986400, 
1548986700, 1548987000, 1548987300, 1548987600, 1548987900, 1548988200, 
1548988500, 1548988800, 1548989100, 1548989400, 1548989700, 1548990000, 
1548990300, 1548990600, 1548990900, 1548991500, 1548991800, 1548992100, 
1548992400, 1548995400, 1548995700, 1548996300, 1548996600, 1548996900, 
1548998400, 1549001100, 1549006500, 1549006800, 1549007100, 1549007400, 
1549007700, 1549008000, 1549008300, 1549008600, 1549009500), class = c("POSIXct", 
"POSIXt"), tzone = ""), close = c(90.78, 90.78, 90.69, 90.63, 
90.94, 90.68, 90.72, 90.65, 90.79, 90.8, 90.87, 90.79, 90.75, 
90.75, 90.93, 90.91, 90.9, 90.85, 90.79, 90.51, 90.31, 90.01, 
89.67, 89.09, 89.49, 89.13, 88.61, 89.42, 89.42, 90.24, 89.42, 
90.34, 90.65, 90.59, 90.16, 89.98, 89.73, 89.83, 90, 89.8, 89.72, 
89.62, 89.62, 89.91, 89.97, 90.11, 90.01, 90.08, 90.09, 90.1, 
90.1, 90.1, 90.13, 90.36, 90.58, 90.17, 90.42, 90.7, 90.71, 90.56, 
90.65, 90.88, 90.87, 90.74, 90.56, 90.51, 90.57, 90.64, 90.78, 
90.94, 90.94, 90.8, 90.83, 90.83, 90.88, 90.95, 90.93, 90.86, 
90.79, 90.65, 90.79, 90.34, 90.31, 90.2, 90.21, 89.79, 89.7, 
89.97, 90.07, 89.82, 90.08, 89.96, 90.64, 90.6, 90.46, 90.41, 
90.26, 90.25, 90.18, 90.25, 90.29, 90.07, 90.35, 90.36, 90.15, 
89.94, 89.75, 89.98, 89.87, 89.92, 90.08, 89.91, 90, 90.42, 90.23, 
90.25, 90.46, 90.94, 90.85, 90.91, 90.93, 90.77, 90.75, 90.79, 
90.73, 90.91, 90.91, 90.91, 90.72, 90.75, 90.79, 90.95, 90.94, 
90.65, 90.63, 90.55, 90.71, 90.76, 90.72, 90.78, 90.79, 90.78, 
90.62, 90.71, 90.71, 90.72, 90.86, 90.9, 90.93, 90.94, 90.65, 
90.74, 90.84, 90.8, 90.8, 90.89, 90.9, 90.9, 90.91, 90.9, 90.86, 
90.52, 90.54, 90.7, 90.51, 90.69, 90.7, 90.75, 90.66, 90.67, 
90.8, 90.83, 90.82, 90.91, 90.84, 90.71, 90.69, 90.84, 90.75, 
90.8, 90.74, 90.87, 90.94, 90.94, 90.86, 90.71, 90.72, 90.72, 
90.83, 90.93, 90.93, 90.87, 90.78, 90.63, 90.54, 90.54, 90.61, 
90.79, 90.71, 90.84)), row.names = c(NA, -200L), class = "data.frame")

Answer 1

有一种tidyverse的方式，正如@Edwin所正确建议的那样：

df %>% group_by(date) %>% summarise(mclose = mean(close))

如果您的数据很大，也可以使用data.table进行计算，以避免管道开销。

library(data.table)
dt <- as.data.table(df)
# Optional: set date as a key, to make grouping faster.
# setkey(dt, date)
dt[, mean(close), date]
#                     date    V1
#   1: 2019-01-28 12:50:00 90.78
#   2: 2019-01-28 12:55:00 90.78
#   3: 2019-01-28 13:05:00 90.69
#   4: 2019-01-28 13:40:00 90.63
#   5: 2019-01-28 13:45:00 90.94
#  ---                          
# 195: 2019-02-01 08:55:00 90.54
# 196: 2019-02-01 09:00:00 90.61
# 197: 2019-02-01 09:05:00 90.79
# 198: 2019-02-01 09:10:00 90.71
# 199: 2019-02-01 09:25:00 90.84

您可以轻松命名输出变量：

data <- dt[, .(mclose=mean(close)), date]

您可以使用POSIXt和IDate类型摆脱ITime。它将使用整数，因此不会出现分组问题。

dt[, `:=`(date=as.IDate(date), time=as.ITime(date))]
data <- dt[, .(mclose=mean(close)), .(date, time)]

重复的POSIXct结果-随着时间的推移汇总？

1 个答案: