这就是我的数据框架。
我想创建15分钟或30分钟的时间间隔,并且在该时间间隔内所有时间戳的总和为No_Words
。我需要这个来绘制每个时间间隔的平均单词数。
我应该怎么做?
另外,我真的想知道使用sqldf
包是否可以使用解决方案。
Time No_Words
1 2013-11-17 13:37:00 6
2 2013-11-17 13:37:00 16
3 2013-11-17 13:37:00 18
4 2013-11-17 13:37:00 12
5 2013-11-17 14:03:00 5
6 2013-11-17 14:03:00 20
7 2013-11-17 14:04:00 4
8 2013-11-17 17:21:00 39
9 2013-11-17 22:48:00 19
10 2013-11-17 22:48:00 12
答案 0 :(得分:2)
sqldf 这是一个sqldf解决方案,输入数据框为DF
:
library(sqldf)
min15 <- 15 * 60 # in seconds
ans <- fn$sqldf("select
t.Time - t.Time % $min15 as Time,
sum(t.No_Words) as No_Words
from DF t
group by Time")
plot(No_Words ~ Time, ans, type = "o")
,并提供:
> ans
Time No_Words
1 2013-11-17 13:30:00 52
2 2013-11-17 14:00:00 29
3 2013-11-17 17:15:00 39
4 2013-11-17 22:45:00 31
使用密集网格如果需要密集网格,那么我们需要一个网格数据框G
,我们将它与之前的ans
连接起来(注意sqldf拉动在chron包中,我们使用它的trunc
函数):
# create grid G
rng <- range(as.POSIXct(trunc(as.chron(DF$Time), 15 / (24 * 60))))
G <- data.frame(Time = seq(rng[1], rng[2], by = min15))
ans2 <- sqldf("select Time, coalesce(No_Words, 0) as No_Words
from (select * from G left join ans using(Time))")
plot(No_Words ~ Time, ans2, type = "o")
ans2
的前几行是:
> head(ans2)
Time No_Words
1 2013-11-17 13:30:00 52
2 2013-11-17 13:45:00 0
3 2013-11-17 14:00:00 29
4 2013-11-17 14:15:00 0
5 2013-11-17 14:30:00 0
6 2013-11-17 14:45:00 0
动物园我们还展示了一个动物园解决方案:
library(zoo)
library(chron)
FUN <- function(x) as.POSIXct(trunc(as.chron(x), 15 / (24 * 60)))
z <- read.zoo(DF, FUN = FUN, aggregate = sum)
plot(z)
赋予z
:
> z
2013-11-17 13:30:00 2013-11-17 14:00:00 2013-11-17 17:15:00 2013-11-17 22:45:00
52 29 39 31
注意:我们使用了这些数据,特别是Time
属于"POSIXct"
类:
Lines<- " Time No_Words
1 2013-11-17 13:37:00 6
2 2013-11-17 13:37:00 16
3 2013-11-17 13:37:00 18
4 2013-11-17 13:37:00 12
5 2013-11-17 14:03:00 5
6 2013-11-17 14:03:00 20
7 2013-11-17 14:04:00 4
8 2013-11-17 17:21:00 39
9 2013-11-17 22:48:00 19
10 2013-11-17 22:48:00 12
"
raw <- read.table(text = Lines, skip = 1)
DF <- data.frame(Time = as.POSIXct(paste(raw$V2, raw$V3)), No_Words = raw$V4)
答案 1 :(得分:1)
这个答案不是sqldf,而是基础R函数aggregate
和cut
:
## If your "Time" column is not an actual time object,
## convert it to one before proceeding.
mydf$Time <- as.POSIXct(mydf$Time)
cut
可以创建时间段。我们将使用它来进行聚合。您可以使用formula
表示法,但我使用了list
方法,因此很容易指定生成的列名称:
## Aggregate data in 30 minute chunks
aggregate(list(No_Words = mydf$No_Words),
list(Time = cut(mydf$Time, "30 min")), FUN = mean)
# Time No_Words
# 1 2013-11-17 13:37:00 11.57143
# 2 2013-11-17 17:07:00 39.00000
# 3 2013-11-17 22:37:00 15.50000
## Aggregate data into 15 minute chunks
aggregate(list(No_Words = mydf$No_Words),
list(Time = cut(mydf$Time, "15 min")), FUN = mean)
# Time No_Words
# 1 2013-11-17 13:37:00 13.000000
# 2 2013-11-17 13:52:00 9.666667
# 3 2013-11-17 17:07:00 39.000000
# 4 2013-11-17 22:37:00 15.500000
答案 2 :(得分:1)
# generate example data, 30 min intervals
set.seed(1)
dateseq <- seq(as.POSIXct("2013-11-17"), as.POSIXct("2013-11-18"), by="min")
df <- data.frame(Time=dateseq[sample(1:length(dateseq), 500)],
No_Words=sample(1:100, 500, replace=T))
groups <- cut.POSIXt(df$Time, breaks="30 min")
使用sqldf
:
library(sqldf)
df$groups <- groups
agg <- sqldf("select groups, avg(No_Words) from df group by groups", row.names=T)
row.names(agg) <- agg[,1]
agg <- as.matrix(agg)
class(agg) <- "numeric"
par(mar=c(2,10,0,0)); barplot(agg[,2], horiz=TRUE, las=1)
使用例如的简单方法tapply
:
agg <- tapply(df$No_Words, list(groups), mean)
par(mar=c(2,10,0,0)); barplot(agg, horiz=TRUE, las=1)